2025-05-07T19:42:32.8541976Z Current runner version: '2.323.0' 2025-05-07T19:42:32.8548078Z Runner name: 'i-061ecfb3f7340882c' 2025-05-07T19:42:32.8548987Z Machine name: 'ip-10-0-64-236' 2025-05-07T19:42:32.8551914Z ##[group]GITHUB_TOKEN Permissions 2025-05-07T19:42:32.8554320Z Contents: read 2025-05-07T19:42:32.8554881Z Metadata: read 2025-05-07T19:42:32.8555337Z Packages: read 2025-05-07T19:42:32.8556011Z ##[endgroup] 2025-05-07T19:42:32.8558005Z Secret source: None 2025-05-07T19:42:32.8558715Z Prepare workflow directory 2025-05-07T19:42:32.9188733Z Prepare all required actions 2025-05-07T19:42:32.9232625Z Getting action download info 2025-05-07T19:42:34.2942028Z Download action repository 'actions/checkout@v4' (SHA:11bd71901bbe5b1630ceea73d27597364c9af683) 2025-05-07T19:42:34.5732398Z Download action repository 'actions/upload-artifact@v4' (SHA:ea165f8d65b6e75b540449e92b4886f43607fa02) 2025-05-07T19:42:35.1473240Z Complete job name: build_artifact (x86, linux.24xlarge, genai, 3.13, 12.8.0, gcc) 2025-05-07T19:42:35.2263013Z A job started hook has been configured by the self-hosted runner administrator 2025-05-07T19:42:35.2369173Z ##[group]Run '/home/ec2-user/runner-scripts/before_job.sh' 2025-05-07T19:42:35.2377932Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-05-07T19:42:35.2378634Z ##[endgroup] 2025-05-07T19:42:36.3792727Z Runner Type: linux.24xlarge 2025-05-07T19:42:36.3793259Z Instance Type: c5.24xlarge 2025-05-07T19:42:36.3793570Z AMI Name: unknown 2025-05-07T19:42:36.3824772Z AMI ID: ami-071226ecf16aa7d96 2025-05-07T19:42:41.4648153Z ##[group]Checking docker version 2025-05-07T19:42:41.4661725Z ##[command]/usr/bin/docker version --format '{{.Server.APIVersion}}' 2025-05-07T19:42:41.4989550Z '1.44' 2025-05-07T19:42:41.5008560Z Docker daemon API version: '1.44' 2025-05-07T19:42:41.5009136Z ##[command]/usr/bin/docker version --format '{{.Client.APIVersion}}' 2025-05-07T19:42:41.5212565Z '1.44' 2025-05-07T19:42:41.5226065Z Docker client API version: '1.44' 2025-05-07T19:42:41.5233248Z ##[endgroup] 2025-05-07T19:42:41.5236512Z ##[group]Clean up resources from previous jobs 2025-05-07T19:42:41.5242584Z ##[command]/usr/bin/docker ps --all --quiet --no-trunc --filter "label=3a7dad" 2025-05-07T19:42:41.5438109Z ##[command]/usr/bin/docker network prune --force --filter "label=3a7dad" 2025-05-07T19:42:41.5586655Z ##[endgroup] 2025-05-07T19:42:41.5587072Z ##[group]Create local container network 2025-05-07T19:42:41.5596392Z ##[command]/usr/bin/docker network create --label 3a7dad github_network_2fbdf5bf774c440b8886f7414b350d71 2025-05-07T19:42:41.8330978Z 02a8ac43c4973408ec0f861d4b9141e3c02b3cfe914d0e68cba0243eca1fd251 2025-05-07T19:42:41.8356018Z ##[endgroup] 2025-05-07T19:42:41.8386421Z ##[group]Starting job container 2025-05-07T19:42:41.8409836Z ##[command]/usr/bin/docker pull amazonlinux:2023 2025-05-07T19:42:42.0097619Z 2023: Pulling from library/amazonlinux 2025-05-07T19:42:42.0258251Z Digest: sha256:cb5b4c509d62ae388f674c139ae5e8281fc160c217d474445e912043e1941988 2025-05-07T19:42:42.0275307Z Status: Image is up to date for amazonlinux:2023 2025-05-07T19:42:42.0301086Z docker.io/library/amazonlinux:2023 2025-05-07T19:42:42.0398689Z ##[command]/usr/bin/docker create --name bc8a7aa379e24ad1bb0513de8877a55e_amazonlinux2023_b22b95 --label 3a7dad --workdir /__w/FBGEMM/FBGEMM --network github_network_2fbdf5bf774c440b8886f7414b350d71 --user root -e "HOME=/github/home" -e GITHUB_ACTIONS=true -e CI=true -v "/var/run/docker.sock":"/var/run/docker.sock" -v "/home/ec2-user/actions-runner/_work":"/__w" -v "/home/ec2-user/actions-runner/externals":"/__e":ro -v "/home/ec2-user/actions-runner/_work/_temp":"/__w/_temp" -v "/home/ec2-user/actions-runner/_work/_actions":"/__w/_actions" -v "/home/ec2-user/actions-runner/_work/_tool":"/__w/_tool" -v "/home/ec2-user/actions-runner/_work/_temp/_github_home":"/github/home" -v "/home/ec2-user/actions-runner/_work/_temp/_github_workflow":"/github/workflow" --entrypoint "tail" amazonlinux:2023 "-f" "/dev/null" 2025-05-07T19:42:42.1471873Z 565b81b7c816cbdd14afbfa510e3c8636c8644acf5a2e5045d5b002a6b1a6184 2025-05-07T19:42:42.1503596Z ##[command]/usr/bin/docker start 565b81b7c816cbdd14afbfa510e3c8636c8644acf5a2e5045d5b002a6b1a6184 2025-05-07T19:42:42.9034332Z 565b81b7c816cbdd14afbfa510e3c8636c8644acf5a2e5045d5b002a6b1a6184 2025-05-07T19:42:42.9055553Z ##[command]/usr/bin/docker ps --all --filter id=565b81b7c816cbdd14afbfa510e3c8636c8644acf5a2e5045d5b002a6b1a6184 --filter status=running --no-trunc --format "{{.ID}} {{.Status}}" 2025-05-07T19:42:42.9218069Z 565b81b7c816cbdd14afbfa510e3c8636c8644acf5a2e5045d5b002a6b1a6184 Up Less than a second 2025-05-07T19:42:42.9241814Z ##[command]/usr/bin/docker inspect --format "{{range .Config.Env}}{{println .}}{{end}}" 565b81b7c816cbdd14afbfa510e3c8636c8644acf5a2e5045d5b002a6b1a6184 2025-05-07T19:42:42.9391017Z CI=true 2025-05-07T19:42:42.9391484Z HOME=/github/home 2025-05-07T19:42:42.9391816Z GITHUB_ACTIONS=true 2025-05-07T19:42:42.9392390Z PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin 2025-05-07T19:42:42.9413396Z ##[endgroup] 2025-05-07T19:42:42.9423410Z ##[group]Waiting for all services to be ready 2025-05-07T19:42:42.9425507Z ##[endgroup] 2025-05-07T19:42:42.9504087Z ##[group]Run yum update -y; yum install -y binutils findutils git pciutils sudo tar wget which 2025-05-07T19:42:42.9505039Z yum update -y; yum install -y binutils findutils git pciutils sudo tar wget which 2025-05-07T19:42:42.9505999Z shell: bash --noprofile --norc -e -o pipefail {0} 2025-05-07T19:42:42.9506440Z env: 2025-05-07T19:42:42.9506783Z PRELUDE: .github/scripts/setup_env.bash 2025-05-07T19:42:42.9507268Z BUILD_ENV: build_binary 2025-05-07T19:42:42.9507582Z BUILD_TARGET: genai 2025-05-07T19:42:42.9507908Z BUILD_VARIANT: cuda 2025-05-07T19:42:42.9508292Z BUILD_CUDA_VERSION: 12.8.0 2025-05-07T19:42:42.9508602Z ##[endgroup] 2025-05-07T19:42:44.4729748Z Amazon Linux 2023 repository 60 MB/s | 37 MB 00:00 2025-05-07T19:42:51.0405005Z Last metadata expiration check: 0:00:07 ago on Wed May 7 19:42:44 2025. 2025-05-07T19:42:51.5941719Z Dependencies resolved. 2025-05-07T19:42:51.6114598Z Nothing to do. 2025-05-07T19:42:51.6115337Z Complete! 2025-05-07T19:42:51.8498037Z Last metadata expiration check: 0:00:07 ago on Wed May 7 19:42:44 2025. 2025-05-07T19:42:51.9141438Z Dependencies resolved. 2025-05-07T19:42:51.9381496Z ======================================================================================== 2025-05-07T19:42:51.9382150Z Package Arch Version Repository Size 2025-05-07T19:42:51.9382926Z ======================================================================================== 2025-05-07T19:42:51.9383373Z Installing: 2025-05-07T19:42:51.9383914Z binutils x86_64 2.41-50.amzn2023.0.3 amazonlinux 5.3 M 2025-05-07T19:42:51.9384535Z findutils x86_64 1:4.8.0-2.amzn2023.0.2 amazonlinux 539 k 2025-05-07T19:42:51.9385163Z git x86_64 2.47.1-1.amzn2023.0.2 amazonlinux 54 k 2025-05-07T19:42:51.9385875Z pciutils x86_64 3.7.0-3.amzn2023.0.2 amazonlinux 93 k 2025-05-07T19:42:51.9386744Z sudo x86_64 1.9.15-1.p5.amzn2023.0.1 amazonlinux 1.3 M 2025-05-07T19:42:51.9387310Z tar x86_64 2:1.34-1.amzn2023.0.4 amazonlinux 879 k 2025-05-07T19:42:51.9387922Z wget x86_64 1.21.3-1.amzn2023.0.4 amazonlinux 779 k 2025-05-07T19:42:51.9388497Z which x86_64 2.21-26.amzn2023.0.2 amazonlinux 42 k 2025-05-07T19:42:51.9389021Z Installing dependencies: 2025-05-07T19:42:51.9389448Z cracklib x86_64 2.9.6-27.amzn2023.0.2 amazonlinux 82 k 2025-05-07T19:42:51.9390147Z cyrus-sasl-lib x86_64 2.1.27-18.amzn2023.0.3 amazonlinux 786 k 2025-05-07T19:42:51.9390896Z elfutils-debuginfod-client x86_64 0.188-3.amzn2023.0.2 amazonlinux 41 k 2025-05-07T19:42:51.9391529Z git-core x86_64 2.47.1-1.amzn2023.0.2 amazonlinux 4.7 M 2025-05-07T19:42:51.9392627Z git-core-doc noarch 2.47.1-1.amzn2023.0.2 amazonlinux 2.8 M 2025-05-07T19:42:51.9393215Z gnutls x86_64 3.8.3-6.amzn2023.0.1 amazonlinux 1.1 M 2025-05-07T19:42:51.9393813Z groff-base x86_64 1.22.4-7.amzn2023.0.2 amazonlinux 1.0 M 2025-05-07T19:42:51.9394471Z gzip x86_64 1.12-1.amzn2023.0.1 amazonlinux 160 k 2025-05-07T19:42:51.9395126Z hwdata noarch 0.384-1.amzn2023.0.3 amazonlinux 1.6 M 2025-05-07T19:42:51.9395741Z jansson x86_64 2.14-0.amzn2023 amazonlinux 46 k 2025-05-07T19:42:51.9396339Z kmod-libs x86_64 29-2.amzn2023.0.5 amazonlinux 62 k 2025-05-07T19:42:51.9396916Z less x86_64 608-2.amzn2023.0.2 amazonlinux 168 k 2025-05-07T19:42:51.9397635Z libcbor x86_64 0.7.0-3.amzn2023.0.2 amazonlinux 57 k 2025-05-07T19:42:51.9398233Z libdb x86_64 5.3.28-49.amzn2023.0.2 amazonlinux 756 k 2025-05-07T19:42:51.9398811Z libeconf x86_64 0.4.0-1.amzn2023.0.3 amazonlinux 28 k 2025-05-07T19:42:51.9399351Z libedit x86_64 3.1-38.20210714cvs.amzn2023.0.2 amazonlinux 108 k 2025-05-07T19:42:51.9399966Z libfdisk x86_64 2.37.4-1.amzn2023.0.4 amazonlinux 153 k 2025-05-07T19:42:51.9400927Z libfido2 x86_64 1.10.0-2.amzn2023.0.2 amazonlinux 95 k 2025-05-07T19:42:51.9401635Z libmetalink x86_64 0.1.3-14.amzn2023.0.2 amazonlinux 31 k 2025-05-07T19:42:51.9539876Z libpwquality x86_64 1.4.4-6.amzn2023.0.2 amazonlinux 106 k 2025-05-07T19:42:51.9540710Z libsemanage x86_64 3.4-5.amzn2023.0.2 amazonlinux 121 k 2025-05-07T19:42:51.9541325Z libutempter x86_64 1.2.1-4.amzn2023.0.2 amazonlinux 26 k 2025-05-07T19:42:51.9541921Z nano x86_64 8.3-1.amzn2023 amazonlinux 706 k 2025-05-07T19:42:51.9542475Z ncurses x86_64 6.2-4.20200222.amzn2023.0.6 amazonlinux 394 k 2025-05-07T19:42:51.9543010Z nettle x86_64 3.10.1-1.amzn2023.0.1 amazonlinux 573 k 2025-05-07T19:42:51.9543566Z openldap x86_64 2.4.57-6.amzn2023.0.7 amazonlinux 256 k 2025-05-07T19:42:51.9544183Z openssh x86_64 8.7p1-8.amzn2023.0.14 amazonlinux 454 k 2025-05-07T19:42:51.9544832Z openssh-clients x86_64 8.7p1-8.amzn2023.0.14 amazonlinux 708 k 2025-05-07T19:42:51.9545468Z pam x86_64 1.5.1-8.amzn2023.0.4 amazonlinux 542 k 2025-05-07T19:42:51.9546026Z pciutils-libs x86_64 3.7.0-3.amzn2023.0.2 amazonlinux 41 k 2025-05-07T19:42:51.9546644Z perl-AutoLoader noarch 5.74-477.amzn2023.0.6 amazonlinux 22 k 2025-05-07T19:42:51.9547233Z perl-B x86_64 1.80-477.amzn2023.0.6 amazonlinux 179 k 2025-05-07T19:42:51.9547802Z perl-Carp noarch 1.50-458.amzn2023.0.2 amazonlinux 29 k 2025-05-07T19:42:51.9548438Z perl-Class-Struct noarch 0.66-477.amzn2023.0.6 amazonlinux 22 k 2025-05-07T19:42:51.9549162Z perl-Data-Dumper x86_64 2.174-460.amzn2023.0.2 amazonlinux 55 k 2025-05-07T19:42:51.9549795Z perl-Digest noarch 1.20-1.amzn2023.0.2 amazonlinux 26 k 2025-05-07T19:42:51.9550392Z perl-Digest-MD5 x86_64 2.58-2.amzn2023.0.2 amazonlinux 36 k 2025-05-07T19:42:51.9551062Z perl-DynaLoader x86_64 1.47-477.amzn2023.0.6 amazonlinux 26 k 2025-05-07T19:42:51.9551991Z perl-Encode x86_64 4:3.15-462.amzn2023.0.2 amazonlinux 1.7 M 2025-05-07T19:42:51.9552607Z perl-Errno x86_64 1.30-477.amzn2023.0.6 amazonlinux 15 k 2025-05-07T19:42:51.9553278Z perl-Error noarch 1:0.17029-5.amzn2023.0.2 amazonlinux 41 k 2025-05-07T19:42:51.9553832Z perl-Exporter noarch 5.74-459.amzn2023.0.2 amazonlinux 31 k 2025-05-07T19:42:51.9554378Z perl-Fcntl x86_64 1.13-477.amzn2023.0.6 amazonlinux 21 k 2025-05-07T19:42:51.9554966Z perl-File-Basename noarch 2.85-477.amzn2023.0.6 amazonlinux 18 k 2025-05-07T19:42:51.9555587Z perl-File-Find noarch 1.37-477.amzn2023.0.6 amazonlinux 26 k 2025-05-07T19:42:51.9556160Z perl-File-Path noarch 2.18-2.amzn2023.0.2 amazonlinux 36 k 2025-05-07T19:42:51.9556763Z perl-File-Temp noarch 1:0.231.100-2.amzn2023.0.2 amazonlinux 60 k 2025-05-07T19:42:51.9557468Z perl-File-stat noarch 1.09-477.amzn2023.0.6 amazonlinux 17 k 2025-05-07T19:42:51.9558089Z perl-FileHandle noarch 2.03-477.amzn2023.0.6 amazonlinux 16 k 2025-05-07T19:42:51.9558732Z perl-Getopt-Long noarch 1:2.52-2.amzn2023.0.2 amazonlinux 60 k 2025-05-07T19:42:51.9559402Z perl-Getopt-Std noarch 1.12-477.amzn2023.0.6 amazonlinux 16 k 2025-05-07T19:42:51.9559998Z perl-Git noarch 2.47.1-1.amzn2023.0.2 amazonlinux 42 k 2025-05-07T19:42:51.9560562Z perl-HTTP-Tiny noarch 0.078-1.amzn2023.0.3 amazonlinux 56 k 2025-05-07T19:42:51.9561134Z perl-IO x86_64 1.43-477.amzn2023.0.6 amazonlinux 87 k 2025-05-07T19:42:51.9561704Z perl-IPC-Open3 noarch 1.21-477.amzn2023.0.6 amazonlinux 23 k 2025-05-07T19:42:51.9562274Z perl-MIME-Base64 x86_64 3.16-2.amzn2023.0.2 amazonlinux 31 k 2025-05-07T19:42:51.9562873Z perl-Net-SSLeay x86_64 1.94-1.amzn2023.0.1 amazonlinux 392 k 2025-05-07T19:42:51.9563427Z perl-POSIX x86_64 1.94-477.amzn2023.0.6 amazonlinux 97 k 2025-05-07T19:42:51.9564018Z perl-PathTools x86_64 3.78-459.amzn2023.0.2 amazonlinux 85 k 2025-05-07T19:42:51.9564601Z perl-Pod-Escapes noarch 1:1.07-458.amzn2023.0.2 amazonlinux 20 k 2025-05-07T19:42:51.9565215Z perl-Pod-Perldoc noarch 3.28.01-459.amzn2023.0.3 amazonlinux 84 k 2025-05-07T19:42:51.9565827Z perl-Pod-Simple noarch 1:3.42-2.amzn2023.0.2 amazonlinux 215 k 2025-05-07T19:42:51.9566396Z perl-Pod-Usage noarch 4:2.01-2.amzn2023.0.2 amazonlinux 41 k 2025-05-07T19:42:51.9567018Z perl-Scalar-List-Utils x86_64 4:1.56-459.amzn2023.0.2 amazonlinux 71 k 2025-05-07T19:42:51.9567632Z perl-SelectSaver noarch 1.02-477.amzn2023.0.6 amazonlinux 12 k 2025-05-07T19:42:51.9568232Z perl-Socket x86_64 4:2.032-1.amzn2023.0.2 amazonlinux 55 k 2025-05-07T19:42:51.9568800Z perl-Storable x86_64 1:3.21-458.amzn2023.0.2 amazonlinux 96 k 2025-05-07T19:42:51.9569351Z perl-Symbol noarch 1.08-477.amzn2023.0.6 amazonlinux 15 k 2025-05-07T19:42:51.9569968Z perl-Term-ANSIColor noarch 5.01-459.amzn2023.0.2 amazonlinux 48 k 2025-05-07T19:42:51.9570558Z perl-Term-Cap noarch 1.17-458.amzn2023.0.2 amazonlinux 22 k 2025-05-07T19:42:51.9571157Z perl-TermReadKey x86_64 2.38-9.amzn2023.0.2 amazonlinux 36 k 2025-05-07T19:42:51.9571763Z perl-Text-ParseWords noarch 3.30-458.amzn2023.0.2 amazonlinux 17 k 2025-05-07T19:42:51.9572419Z perl-Text-Tabs+Wrap noarch 2021.0726-1.amzn2023.0.1 amazonlinux 22 k 2025-05-07T19:42:51.9573129Z perl-Time-Local noarch 2:1.300-5.amzn2023.0.2 amazonlinux 34 k 2025-05-07T19:42:51.9573689Z perl-URI noarch 5.09-1.amzn2023.0.2 amazonlinux 108 k 2025-05-07T19:42:51.9574260Z perl-base noarch 2.27-477.amzn2023.0.6 amazonlinux 17 k 2025-05-07T19:42:51.9574824Z perl-constant noarch 1.33-459.amzn2023.0.2 amazonlinux 23 k 2025-05-07T19:42:51.9575410Z perl-if noarch 0.60.800-477.amzn2023.0.6 amazonlinux 14 k 2025-05-07T19:42:51.9575974Z perl-interpreter x86_64 4:5.32.1-477.amzn2023.0.6 amazonlinux 71 k 2025-05-07T19:42:51.9576505Z perl-lib x86_64 0.65-477.amzn2023.0.6 amazonlinux 15 k 2025-05-07T19:42:51.9577046Z perl-libnet noarch 3.13-2.amzn2023.0.2 amazonlinux 126 k 2025-05-07T19:42:51.9577569Z perl-libs x86_64 4:5.32.1-477.amzn2023.0.6 amazonlinux 2.0 M 2025-05-07T19:42:51.9578191Z perl-mro x86_64 1.23-477.amzn2023.0.6 amazonlinux 29 k 2025-05-07T19:42:51.9578750Z perl-overload noarch 1.31-477.amzn2023.0.6 amazonlinux 46 k 2025-05-07T19:42:51.9579334Z perl-overloading noarch 0.02-477.amzn2023.0.6 amazonlinux 13 k 2025-05-07T19:42:51.9580184Z perl-parent noarch 1:0.238-458.amzn2023.0.2 amazonlinux 14 k 2025-05-07T19:42:51.9580822Z perl-podlators noarch 1:4.14-458.amzn2023.0.2 amazonlinux 112 k 2025-05-07T19:42:51.9581419Z perl-subs noarch 1.03-477.amzn2023.0.6 amazonlinux 12 k 2025-05-07T19:42:51.9582022Z perl-vars noarch 1.05-477.amzn2023.0.6 amazonlinux 13 k 2025-05-07T19:42:51.9582593Z shadow-utils x86_64 2:4.9-12.amzn2023.0.4 amazonlinux 1.1 M 2025-05-07T19:42:51.9583209Z systemd-libs x86_64 252.23-3.amzn2023 amazonlinux 613 k 2025-05-07T19:42:51.9583772Z util-linux x86_64 2.37.4-1.amzn2023.0.4 amazonlinux 2.2 M 2025-05-07T19:42:51.9584373Z util-linux-core x86_64 2.37.4-1.amzn2023.0.4 amazonlinux 432 k 2025-05-07T19:42:51.9584871Z Installing weak dependencies: 2025-05-07T19:42:51.9585355Z nano-default-editor noarch 8.3-1.amzn2023 amazonlinux 10 k 2025-05-07T19:42:51.9586022Z perl-IO-Socket-IP noarch 0.41-3.amzn2023.0.2 amazonlinux 42 k 2025-05-07T19:42:51.9586646Z perl-IO-Socket-SSL noarch 2.075-1.amzn2023.0.2 amazonlinux 218 k 2025-05-07T19:42:51.9587298Z perl-Mozilla-CA noarch 20200520-4.amzn2023.0.2 amazonlinux 13 k 2025-05-07T19:42:51.9587922Z perl-NDBM_File x86_64 1.15-477.amzn2023.0.6 amazonlinux 23 k 2025-05-07T19:42:51.9588526Z sudo-python-plugin x86_64 1.9.15-1.p5.amzn2023.0.1 amazonlinux 56 k 2025-05-07T19:42:51.9588895Z 2025-05-07T19:42:51.9589028Z Transaction Summary 2025-05-07T19:42:51.9589334Z ======================================================================================== 2025-05-07T19:42:51.9589711Z Install 107 Packages 2025-05-07T19:42:51.9589874Z 2025-05-07T19:42:51.9590036Z Total download size: 38 M 2025-05-07T19:42:51.9590354Z Installed size: 151 M 2025-05-07T19:42:51.9590659Z Downloading Packages: 2025-05-07T19:42:52.2310926Z (1/107): cracklib-2.9.6-27.amzn2023.0.2.x86_64. 3.9 MB/s | 82 kB 00:00 2025-05-07T19:42:52.2472156Z (2/107): cyrus-sasl-lib-2.1.27-18.amzn2023.0.3. 21 MB/s | 786 kB 00:00 2025-05-07T19:42:52.2482114Z (3/107): elfutils-debuginfod-client-0.188-3.amz 2.7 MB/s | 41 kB 00:00 2025-05-07T19:42:52.2758459Z (4/107): binutils-2.41-50.amzn2023.0.3.x86_64.r 80 MB/s | 5.3 MB 00:00 2025-05-07T19:42:52.2812126Z (5/107): findutils-4.8.0-2.amzn2023.0.2.x86_64. 18 MB/s | 539 kB 00:00 2025-05-07T19:42:52.2824708Z (6/107): git-2.47.1-1.amzn2023.0.2.x86_64.rpm 1.7 MB/s | 54 kB 00:00 2025-05-07T19:42:52.3047848Z (7/107): gnutls-3.8.3-6.amzn2023.0.1.x86_64.rpm 55 MB/s | 1.1 MB 00:00 2025-05-07T19:42:52.3278118Z (8/107): git-core-2.47.1-1.amzn2023.0.2.x86_64. 93 MB/s | 4.7 MB 00:00 2025-05-07T19:42:52.3409181Z (9/107): git-core-doc-2.47.1-1.amzn2023.0.2.noa 50 MB/s | 2.8 MB 00:00 2025-05-07T19:42:52.3488323Z (10/107): groff-base-1.22.4-7.amzn2023.0.2.x86_ 25 MB/s | 1.0 MB 00:00 2025-05-07T19:42:52.3589037Z (11/107): gzip-1.12-1.amzn2023.0.1.x86_64.rpm 9.7 MB/s | 160 kB 00:00 2025-05-07T19:42:52.3683192Z (12/107): hwdata-0.384-1.amzn2023.0.3.noarch.rp 64 MB/s | 1.6 MB 00:00 2025-05-07T19:42:52.3697588Z (13/107): jansson-2.14-0.amzn2023.x86_64.rpm 2.6 MB/s | 46 kB 00:00 2025-05-07T19:42:52.3772565Z (14/107): kmod-libs-29-2.amzn2023.0.5.x86_64.rp 9.5 MB/s | 62 kB 00:00 2025-05-07T19:42:52.3803763Z (15/107): less-608-2.amzn2023.0.2.x86_64.rpm 18 MB/s | 168 kB 00:00 2025-05-07T19:42:52.3818058Z (16/107): libcbor-0.7.0-3.amzn2023.0.2.x86_64.r 5.1 MB/s | 57 kB 00:00 2025-05-07T19:42:52.3893033Z (17/107): libdb-5.3.28-49.amzn2023.0.2.x86_64.r 65 MB/s | 756 kB 00:00 2025-05-07T19:42:52.3904618Z (18/107): libeconf-0.4.0-1.amzn2023.0.3.x86_64. 3.3 MB/s | 28 kB 00:00 2025-05-07T19:42:52.3947519Z (19/107): libedit-3.1-38.20210714cvs.amzn2023.0 8.6 MB/s | 108 kB 00:00 2025-05-07T19:42:52.3977064Z (20/107): libfdisk-2.37.4-1.amzn2023.0.4.x86_64 19 MB/s | 153 kB 00:00 2025-05-07T19:42:52.3997445Z (21/107): libfido2-1.10.0-2.amzn2023.0.2.x86_64 11 MB/s | 95 kB 00:00 2025-05-07T19:42:52.4017265Z (22/107): libmetalink-0.1.3-14.amzn2023.0.2.x86 5.1 MB/s | 31 kB 00:00 2025-05-07T19:42:52.4047452Z (23/107): libpwquality-1.4.4-6.amzn2023.0.2.x86 16 MB/s | 106 kB 00:00 2025-05-07T19:42:52.4076542Z (24/107): libsemanage-3.4-5.amzn2023.0.2.x86_64 16 MB/s | 121 kB 00:00 2025-05-07T19:42:52.4089900Z (25/107): libutempter-1.2.1-4.amzn2023.0.2.x86_ 3.6 MB/s | 26 kB 00:00 2025-05-07T19:42:52.4167332Z (26/107): nano-8.3-1.amzn2023.x86_64.rpm 60 MB/s | 706 kB 00:00 2025-05-07T19:42:52.4197734Z (27/107): nano-default-editor-8.3-1.amzn2023.no 897 kB/s | 10 kB 00:00 2025-05-07T19:42:52.4230328Z (28/107): ncurses-6.2-4.20200222.amzn2023.0.6.x 28 MB/s | 394 kB 00:00 2025-05-07T19:42:52.4293174Z (29/107): nettle-3.10.1-1.amzn2023.0.1.x86_64.r 50 MB/s | 573 kB 00:00 2025-05-07T19:42:52.4346442Z (30/107): openssh-8.7p1-8.amzn2023.0.14.x86_64. 46 MB/s | 454 kB 00:00 2025-05-07T19:42:52.4381919Z (31/107): openldap-2.4.57-6.amzn2023.0.7.x86_64 19 MB/s | 256 kB 00:00 2025-05-07T19:42:52.4450099Z (32/107): openssh-clients-8.7p1-8.amzn2023.0.14 50 MB/s | 708 kB 00:00 2025-05-07T19:42:52.4506020Z (33/107): pam-1.5.1-8.amzn2023.0.4.x86_64.rpm 34 MB/s | 542 kB 00:00 2025-05-07T19:42:52.4527605Z (34/107): pciutils-3.7.0-3.amzn2023.0.2.x86_64. 7.1 MB/s | 93 kB 00:00 2025-05-07T19:42:52.4556826Z (35/107): pciutils-libs-3.7.0-3.amzn2023.0.2.x8 4.5 MB/s | 41 kB 00:00 2025-05-07T19:42:52.4582367Z (36/107): perl-AutoLoader-5.74-477.amzn2023.0.6 4.3 MB/s | 22 kB 00:00 2025-05-07T19:42:52.4610957Z (37/107): perl-B-1.80-477.amzn2023.0.6.x86_64.r 23 MB/s | 179 kB 00:00 2025-05-07T19:42:52.4629350Z (38/107): perl-Carp-1.50-458.amzn2023.0.2.noarc 4.3 MB/s | 29 kB 00:00 2025-05-07T19:42:52.4647023Z (39/107): perl-Class-Struct-0.66-477.amzn2023.0 3.8 MB/s | 22 kB 00:00 2025-05-07T19:42:52.4669254Z (40/107): perl-Data-Dumper-2.174-460.amzn2023.0 9.8 MB/s | 55 kB 00:00 2025-05-07T19:42:52.4687793Z (41/107): perl-Digest-1.20-1.amzn2023.0.2.noarc 4.8 MB/s | 26 kB 00:00 2025-05-07T19:42:52.4708006Z (42/107): perl-Digest-MD5-2.58-2.amzn2023.0.2.x 6.5 MB/s | 36 kB 00:00 2025-05-07T19:42:52.4728637Z (43/107): perl-DynaLoader-1.47-477.amzn2023.0.6 4.8 MB/s | 26 kB 00:00 2025-05-07T19:42:52.4790280Z (44/107): perl-Errno-1.30-477.amzn2023.0.6.x86_ 2.0 MB/s | 15 kB 00:00 2025-05-07T19:42:52.4887869Z (45/107): perl-Encode-3.15-462.amzn2023.0.2.x86 87 MB/s | 1.7 MB 00:00 2025-05-07T19:42:52.4899682Z (46/107): perl-Error-0.17029-5.amzn2023.0.2.noa 2.4 MB/s | 41 kB 00:00 2025-05-07T19:42:52.4919043Z (47/107): perl-Exporter-5.74-459.amzn2023.0.2.n 2.7 MB/s | 31 kB 00:00 2025-05-07T19:42:52.4956550Z (48/107): perl-Fcntl-1.13-477.amzn2023.0.6.x86_ 4.2 MB/s | 21 kB 00:00 2025-05-07T19:42:52.4973097Z (49/107): perl-File-Basename-2.85-477.amzn2023. 2.6 MB/s | 18 kB 00:00 2025-05-07T19:42:52.4983196Z (50/107): perl-File-Find-1.37-477.amzn2023.0.6. 4.1 MB/s | 26 kB 00:00 2025-05-07T19:42:52.5005583Z (51/107): perl-File-Path-2.18-2.amzn2023.0.2.no 7.4 MB/s | 36 kB 00:00 2025-05-07T19:42:52.5037422Z (52/107): perl-File-stat-1.09-477.amzn2023.0.6. 3.7 MB/s | 17 kB 00:00 2025-05-07T19:42:52.5055765Z (53/107): perl-File-Temp-0.231.100-2.amzn2023.0 9.0 MB/s | 60 kB 00:00 2025-05-07T19:42:52.5071529Z (54/107): perl-FileHandle-2.03-477.amzn2023.0.6 2.5 MB/s | 16 kB 00:00 2025-05-07T19:42:52.5117285Z (55/107): perl-Getopt-Long-2.52-2.amzn2023.0.2. 7.9 MB/s | 60 kB 00:00 2025-05-07T19:42:52.5125429Z (56/107): perl-Getopt-Std-1.12-477.amzn2023.0.6 2.3 MB/s | 16 kB 00:00 2025-05-07T19:42:52.5145625Z (57/107): perl-Git-2.47.1-1.amzn2023.0.2.noarch 6.2 MB/s | 42 kB 00:00 2025-05-07T19:42:52.5199872Z (58/107): perl-IO-1.43-477.amzn2023.0.6.x86_64. 13 MB/s | 87 kB 00:00 2025-05-07T19:42:52.5220712Z (59/107): perl-HTTP-Tiny-0.078-1.amzn2023.0.3.n 6.3 MB/s | 56 kB 00:00 2025-05-07T19:42:52.5248119Z (60/107): perl-IO-Socket-IP-0.41-3.amzn2023.0.2 4.3 MB/s | 42 kB 00:00 2025-05-07T19:42:52.5285925Z (61/107): perl-IO-Socket-SSL-2.075-1.amzn2023.0 27 MB/s | 218 kB 00:00 2025-05-07T19:42:52.5306520Z (62/107): perl-IPC-Open3-1.21-477.amzn2023.0.6. 2.8 MB/s | 23 kB 00:00 2025-05-07T19:42:52.5321920Z (63/107): perl-MIME-Base64-3.16-2.amzn2023.0.2. 4.7 MB/s | 31 kB 00:00 2025-05-07T19:42:52.5342304Z (64/107): perl-Mozilla-CA-20200520-4.amzn2023.0 2.5 MB/s | 13 kB 00:00 2025-05-07T19:42:52.5369852Z (65/107): perl-NDBM_File-1.15-477.amzn2023.0.6. 5.1 MB/s | 23 kB 00:00 2025-05-07T19:42:52.5418652Z (66/107): perl-Net-SSLeay-1.94-1.amzn2023.0.1.x 43 MB/s | 392 kB 00:00 2025-05-07T19:42:52.5450163Z (67/107): perl-POSIX-1.94-477.amzn2023.0.6.x86_ 9.4 MB/s | 97 kB 00:00 2025-05-07T19:42:52.5466021Z (68/107): perl-PathTools-3.78-459.amzn2023.0.2. 9.6 MB/s | 85 kB 00:00 2025-05-07T19:42:52.5487143Z (69/107): perl-Pod-Escapes-1.07-458.amzn2023.0. 3.3 MB/s | 20 kB 00:00 2025-05-07T19:42:52.5530097Z (70/107): perl-Pod-Perldoc-3.28.01-459.amzn2023 14 MB/s | 84 kB 00:00 2025-05-07T19:42:52.5566757Z (71/107): perl-Pod-Simple-3.42-2.amzn2023.0.2.n 23 MB/s | 215 kB 00:00 2025-05-07T19:42:52.5582376Z (72/107): perl-Pod-Usage-2.01-2.amzn2023.0.2.no 4.4 MB/s | 41 kB 00:00 2025-05-07T19:42:52.5609536Z (73/107): perl-Scalar-List-Utils-1.56-459.amzn2 9.7 MB/s | 71 kB 00:00 2025-05-07T19:42:52.5637320Z (74/107): perl-SelectSaver-1.02-477.amzn2023.0. 2.6 MB/s | 12 kB 00:00 2025-05-07T19:42:52.5656971Z (75/107): perl-Socket-2.032-1.amzn2023.0.2.x86_ 8.3 MB/s | 55 kB 00:00 2025-05-07T19:42:52.5680688Z (76/107): perl-Storable-3.21-458.amzn2023.0.2.x 14 MB/s | 96 kB 00:00 2025-05-07T19:42:52.5699260Z (77/107): perl-Symbol-1.08-477.amzn2023.0.6.noa 2.6 MB/s | 15 kB 00:00 2025-05-07T19:42:52.5738995Z (78/107): perl-Term-ANSIColor-5.01-459.amzn2023 6.2 MB/s | 48 kB 00:00 2025-05-07T19:42:52.5757681Z (79/107): perl-Term-Cap-1.17-458.amzn2023.0.2.n 3.1 MB/s | 22 kB 00:00 2025-05-07T19:42:52.5767823Z (80/107): perl-TermReadKey-2.38-9.amzn2023.0.2. 5.5 MB/s | 36 kB 00:00 2025-05-07T19:42:52.5788550Z (81/107): perl-Text-ParseWords-3.30-458.amzn202 3.5 MB/s | 17 kB 00:00 2025-05-07T19:42:52.5825181Z (82/107): perl-Text-Tabs+Wrap-2021.0726-1.amzn2 4.4 MB/s | 22 kB 00:00 2025-05-07T19:42:52.5840139Z (83/107): perl-Time-Local-1.300-5.amzn2023.0.2. 5.0 MB/s | 34 kB 00:00 2025-05-07T19:42:52.5869681Z (84/107): perl-URI-5.09-1.amzn2023.0.2.noarch.r 14 MB/s | 108 kB 00:00 2025-05-07T19:42:52.5890529Z (85/107): perl-base-2.27-477.amzn2023.0.6.noarc 3.8 MB/s | 17 kB 00:00 2025-05-07T19:42:52.5910837Z (86/107): perl-constant-1.33-459.amzn2023.0.2.n 3.6 MB/s | 23 kB 00:00 2025-05-07T19:42:52.5929980Z (87/107): perl-if-0.60.800-477.amzn2023.0.6.noa 2.5 MB/s | 14 kB 00:00 2025-05-07T19:42:52.5950623Z (88/107): perl-interpreter-5.32.1-477.amzn2023. 13 MB/s | 71 kB 00:00 2025-05-07T19:42:52.5971147Z (89/107): perl-lib-0.65-477.amzn2023.0.6.x86_64 2.9 MB/s | 15 kB 00:00 2025-05-07T19:42:52.5998077Z (90/107): perl-libnet-3.13-2.amzn2023.0.2.noarc 20 MB/s | 126 kB 00:00 2025-05-07T19:42:52.6138298Z (91/107): perl-libs-5.32.1-477.amzn2023.0.6.x86 114 MB/s | 2.0 MB 00:00 2025-05-07T19:42:52.6158446Z (92/107): perl-mro-1.23-477.amzn2023.0.6.x86_64 1.6 MB/s | 29 kB 00:00 2025-05-07T19:42:52.6173079Z (93/107): perl-overload-1.31-477.amzn2023.0.6.n 3.1 MB/s | 46 kB 00:00 2025-05-07T19:42:52.6191982Z (94/107): perl-overloading-0.02-477.amzn2023.0. 2.6 MB/s | 13 kB 00:00 2025-05-07T19:42:52.6237433Z (95/107): perl-parent-0.238-458.amzn2023.0.2.no 2.5 MB/s | 14 kB 00:00 2025-05-07T19:42:52.6262722Z (96/107): perl-podlators-4.14-458.amzn2023.0.2. 14 MB/s | 112 kB 00:00 2025-05-07T19:42:52.6285506Z (97/107): perl-subs-1.03-477.amzn2023.0.6.noarc 1.4 MB/s | 12 kB 00:00 2025-05-07T19:42:52.6319527Z (98/107): perl-vars-1.05-477.amzn2023.0.6.noarc 2.7 MB/s | 13 kB 00:00 2025-05-07T19:42:52.6426403Z (99/107): sudo-1.9.15-1.p5.amzn2023.0.1.x86_64. 94 MB/s | 1.3 MB 00:00 2025-05-07T19:42:52.6501034Z (100/107): shadow-utils-4.9-12.amzn2023.0.4.x86 49 MB/s | 1.1 MB 00:00 2025-05-07T19:42:52.6512259Z (101/107): sudo-python-plugin-1.9.15-1.p5.amzn2 3.0 MB/s | 56 kB 00:00 2025-05-07T19:42:52.6568675Z (102/107): systemd-libs-252.23-3.amzn2023.x86_6 47 MB/s | 613 kB 00:00 2025-05-07T19:42:52.6682775Z (103/107): tar-1.34-1.amzn2023.0.4.x86_64.rpm 54 MB/s | 879 kB 00:00 2025-05-07T19:42:52.6804701Z (104/107): util-linux-2.37.4-1.amzn2023.0.4.x86 78 MB/s | 2.2 MB 00:00 2025-05-07T19:42:52.6841483Z (105/107): util-linux-core-2.37.4-1.amzn2023.0. 16 MB/s | 432 kB 00:00 2025-05-07T19:42:52.6909667Z (106/107): wget-1.21.3-1.amzn2023.0.4.x86_64.rp 35 MB/s | 779 kB 00:00 2025-05-07T19:42:52.6925839Z (107/107): which-2.21-26.amzn2023.0.2.x86_64.rp 6.0 MB/s | 42 kB 00:00 2025-05-07T19:42:52.6944184Z -------------------------------------------------------------------------------- 2025-05-07T19:42:52.6945584Z Total 50 MB/s | 38 MB 00:00 2025-05-07T19:42:53.7498305Z Running transaction check 2025-05-07T19:42:53.8070947Z Transaction check succeeded. 2025-05-07T19:42:53.8071866Z Running transaction test 2025-05-07T19:42:54.2230639Z Transaction test succeeded. 2025-05-07T19:42:54.2232312Z Running transaction 2025-05-07T19:42:54.9939382Z Preparing : 1/1 2025-05-07T19:42:55.0130993Z Installing : systemd-libs-252.23-3.amzn2023.x86_64 1/107 2025-05-07T19:42:55.0423280Z Installing : nettle-3.10.1-1.amzn2023.0.1.x86_64 2/107 2025-05-07T19:42:55.0689855Z Installing : gnutls-3.8.3-6.amzn2023.0.1.x86_64 3/107 2025-05-07T19:42:55.0771655Z Installing : util-linux-core-2.37.4-1.amzn2023.0.4.x86_64 4/107 2025-05-07T19:42:55.0839872Z Running scriptlet: util-linux-core-2.37.4-1.amzn2023.0.4.x86_64 4/107 2025-05-07T19:42:55.0965952Z Installing : pciutils-libs-3.7.0-3.amzn2023.0.2.x86_64 5/107 2025-05-07T19:42:55.1276573Z Installing : ncurses-6.2-4.20200222.amzn2023.0.6.x86_64 6/107 2025-05-07T19:42:55.1365145Z Installing : nano-8.3-1.amzn2023.x86_64 7/107 2025-05-07T19:42:55.1433087Z Installing : nano-default-editor-8.3-1.amzn2023.noarch 8/107 2025-05-07T19:42:55.2023335Z Installing : libsemanage-3.4-5.amzn2023.0.2.x86_64 9/107 2025-05-07T19:42:55.2117052Z Installing : shadow-utils-2:4.9-12.amzn2023.0.4.x86_64 10/107 2025-05-07T19:42:55.2567003Z Running scriptlet: libutempter-1.2.1-4.amzn2023.0.2.x86_64 11/107 2025-05-07T19:42:55.2636815Z Installing : libutempter-1.2.1-4.amzn2023.0.2.x86_64 11/107 2025-05-07T19:42:55.2711867Z Installing : libmetalink-0.1.3-14.amzn2023.0.2.x86_64 12/107 2025-05-07T19:42:55.2781080Z Installing : libfdisk-2.37.4-1.amzn2023.0.4.x86_64 13/107 2025-05-07T19:42:55.2845541Z Installing : libedit-3.1-38.20210714cvs.amzn2023.0.2.x86_64 14/107 2025-05-07T19:42:55.2992660Z Installing : libeconf-0.4.0-1.amzn2023.0.3.x86_64 15/107 2025-05-07T19:42:55.3060034Z Installing : libdb-5.3.28-49.amzn2023.0.2.x86_64 16/107 2025-05-07T19:42:55.3126684Z Installing : libcbor-0.7.0-3.amzn2023.0.2.x86_64 17/107 2025-05-07T19:42:55.3211808Z Installing : libfido2-1.10.0-2.amzn2023.0.2.x86_64 18/107 2025-05-07T19:42:55.3287047Z Installing : less-608-2.amzn2023.0.2.x86_64 19/107 2025-05-07T19:42:55.3344993Z Installing : kmod-libs-29-2.amzn2023.0.5.x86_64 20/107 2025-05-07T19:42:55.3797409Z Installing : jansson-2.14-0.amzn2023.x86_64 21/107 2025-05-07T19:42:55.3891091Z Installing : hwdata-0.384-1.amzn2023.0.3.noarch 22/107 2025-05-07T19:42:55.4058557Z Installing : gzip-1.12-1.amzn2023.0.1.x86_64 23/107 2025-05-07T19:42:55.4512713Z Installing : cracklib-2.9.6-27.amzn2023.0.2.x86_64 24/107 2025-05-07T19:42:55.4707443Z Installing : pam-1.5.1-8.amzn2023.0.4.x86_64 25/107 2025-05-07T19:42:55.5563742Z Installing : libpwquality-1.4.4-6.amzn2023.0.2.x86_64 26/107 2025-05-07T19:42:55.5564513Z Installing : util-linux-2.37.4-1.amzn2023.0.4.x86_64 27/107 2025-05-07T19:42:55.5565031Z warning: /etc/adjtime created as /etc/adjtime.rpmnew 2025-05-07T19:42:55.5565338Z 2025-05-07T19:42:55.5803058Z Running scriptlet: util-linux-2.37.4-1.amzn2023.0.4.x86_64 27/107 2025-05-07T19:42:55.6161638Z Running scriptlet: openssh-8.7p1-8.amzn2023.0.14.x86_64 28/107 2025-05-07T19:42:55.6365946Z Installing : openssh-8.7p1-8.amzn2023.0.14.x86_64 28/107 2025-05-07T19:42:55.6439147Z Installing : openssh-clients-8.7p1-8.amzn2023.0.14.x86_64 29/107 2025-05-07T19:42:55.7562424Z Running scriptlet: openssh-clients-8.7p1-8.amzn2023.0.14.x86_64 29/107 2025-05-07T19:42:55.9088714Z Installing : git-core-2.47.1-1.amzn2023.0.2.x86_64 30/107 2025-05-07T19:42:55.9221033Z Installing : git-core-doc-2.47.1-1.amzn2023.0.2.noarch 31/107 2025-05-07T19:42:55.9667026Z Running scriptlet: groff-base-1.22.4-7.amzn2023.0.2.x86_64 32/107 2025-05-07T19:42:55.9749566Z Installing : groff-base-1.22.4-7.amzn2023.0.2.x86_64 32/107 2025-05-07T19:42:55.9838928Z Running scriptlet: groff-base-1.22.4-7.amzn2023.0.2.x86_64 32/107 2025-05-07T19:42:55.9919881Z Installing : perl-Digest-1.20-1.amzn2023.0.2.noarch 33/107 2025-05-07T19:42:56.0012483Z Installing : perl-Digest-MD5-2.58-2.amzn2023.0.2.x86_64 34/107 2025-05-07T19:42:56.0071265Z Installing : perl-B-1.80-477.amzn2023.0.6.x86_64 35/107 2025-05-07T19:42:56.0120148Z Installing : perl-FileHandle-2.03-477.amzn2023.0.6.noarch 36/107 2025-05-07T19:42:56.0180791Z Installing : perl-AutoLoader-5.74-477.amzn2023.0.6.noarch 37/107 2025-05-07T19:42:56.0271668Z Installing : perl-Data-Dumper-2.174-460.amzn2023.0.2.x86_64 38/107 2025-05-07T19:42:56.0346587Z Installing : perl-libnet-3.13-2.amzn2023.0.2.noarch 39/107 2025-05-07T19:42:56.0453430Z Installing : perl-base-2.27-477.amzn2023.0.6.noarch 40/107 2025-05-07T19:42:56.0663073Z Installing : perl-URI-5.09-1.amzn2023.0.2.noarch 41/107 2025-05-07T19:42:56.0753423Z Installing : perl-Net-SSLeay-1.94-1.amzn2023.0.1.x86_64 42/107 2025-05-07T19:42:56.0810267Z Installing : perl-Text-Tabs+Wrap-2021.0726-1.amzn2023.0.1.noa 43/107 2025-05-07T19:42:56.0860023Z Installing : perl-Mozilla-CA-20200520-4.amzn2023.0.2.noarch 44/107 2025-05-07T19:42:56.0925178Z Installing : perl-if-0.60.800-477.amzn2023.0.6.noarch 45/107 2025-05-07T19:42:56.0984465Z Installing : perl-IO-Socket-IP-0.41-3.amzn2023.0.2.noarch 46/107 2025-05-07T19:42:56.1046559Z Installing : perl-Time-Local-2:1.300-5.amzn2023.0.2.noarch 47/107 2025-05-07T19:42:56.1134549Z Installing : perl-File-Path-2.18-2.amzn2023.0.2.noarch 48/107 2025-05-07T19:42:56.1202817Z Installing : perl-IO-Socket-SSL-2.075-1.amzn2023.0.2.noarch 49/107 2025-05-07T19:42:56.1257306Z Installing : perl-Pod-Escapes-1:1.07-458.amzn2023.0.2.noarch 50/107 2025-05-07T19:42:56.1319457Z Installing : perl-Class-Struct-0.66-477.amzn2023.0.6.noarch 51/107 2025-05-07T19:42:56.1377861Z Installing : perl-POSIX-1.94-477.amzn2023.0.6.x86_64 52/107 2025-05-07T19:42:56.1431993Z Installing : perl-Term-ANSIColor-5.01-459.amzn2023.0.2.noarch 53/107 2025-05-07T19:42:56.1478956Z Installing : perl-IPC-Open3-1.21-477.amzn2023.0.6.noarch 54/107 2025-05-07T19:42:56.1534917Z Installing : perl-subs-1.03-477.amzn2023.0.6.noarch 55/107 2025-05-07T19:42:56.1603180Z Installing : perl-File-Temp-1:0.231.100-2.amzn2023.0.2.noarch 56/107 2025-05-07T19:42:56.1661428Z Installing : perl-HTTP-Tiny-0.078-1.amzn2023.0.3.noarch 57/107 2025-05-07T19:42:56.1768905Z Installing : perl-Term-Cap-1.17-458.amzn2023.0.2.noarch 58/107 2025-05-07T19:42:56.1862208Z Installing : perl-Pod-Simple-1:3.42-2.amzn2023.0.2.noarch 59/107 2025-05-07T19:42:56.1920422Z Installing : perl-Socket-4:2.032-1.amzn2023.0.2.x86_64 60/107 2025-05-07T19:42:56.1967949Z Installing : perl-SelectSaver-1.02-477.amzn2023.0.6.noarch 61/107 2025-05-07T19:42:56.2010402Z Installing : perl-Symbol-1.08-477.amzn2023.0.6.noarch 62/107 2025-05-07T19:42:56.2092038Z Installing : perl-File-stat-1.09-477.amzn2023.0.6.noarch 63/107 2025-05-07T19:42:56.2188775Z Installing : perl-podlators-1:4.14-458.amzn2023.0.2.noarch 64/107 2025-05-07T19:42:56.2263055Z Installing : perl-Pod-Perldoc-3.28.01-459.amzn2023.0.3.noarch 65/107 2025-05-07T19:42:56.2323878Z Installing : perl-Fcntl-1.13-477.amzn2023.0.6.x86_64 66/107 2025-05-07T19:42:56.2381194Z Installing : perl-Text-ParseWords-3.30-458.amzn2023.0.2.noarc 67/107 2025-05-07T19:42:56.2456470Z Installing : perl-mro-1.23-477.amzn2023.0.6.x86_64 68/107 2025-05-07T19:42:56.2523339Z Installing : perl-IO-1.43-477.amzn2023.0.6.x86_64 69/107 2025-05-07T19:42:56.2584446Z Installing : perl-overloading-0.02-477.amzn2023.0.6.noarch 70/107 2025-05-07T19:42:56.2662416Z Installing : perl-Pod-Usage-4:2.01-2.amzn2023.0.2.noarch 71/107 2025-05-07T19:42:56.2707614Z Installing : perl-Errno-1.30-477.amzn2023.0.6.x86_64 72/107 2025-05-07T19:42:56.2760289Z Installing : perl-File-Basename-2.85-477.amzn2023.0.6.noarch 73/107 2025-05-07T19:42:56.2820603Z Installing : perl-Getopt-Std-1.12-477.amzn2023.0.6.noarch 74/107 2025-05-07T19:42:56.2894167Z Installing : perl-MIME-Base64-3.16-2.amzn2023.0.2.x86_64 75/107 2025-05-07T19:42:56.2974528Z Installing : perl-Scalar-List-Utils-4:1.56-459.amzn2023.0.2.x 76/107 2025-05-07T19:42:56.3043161Z Installing : perl-constant-1.33-459.amzn2023.0.2.noarch 77/107 2025-05-07T19:42:56.3109856Z Installing : perl-Storable-1:3.21-458.amzn2023.0.2.x86_64 78/107 2025-05-07T19:42:56.3163017Z Installing : perl-overload-1.31-477.amzn2023.0.6.noarch 79/107 2025-05-07T19:42:56.3217240Z Installing : perl-parent-1:0.238-458.amzn2023.0.2.noarch 80/107 2025-05-07T19:42:56.3284966Z Installing : perl-vars-1.05-477.amzn2023.0.6.noarch 81/107 2025-05-07T19:42:56.3339648Z Installing : perl-Getopt-Long-1:2.52-2.amzn2023.0.2.noarch 82/107 2025-05-07T19:42:56.3392702Z Installing : perl-DynaLoader-1.47-477.amzn2023.0.6.x86_64 83/107 2025-05-07T19:42:56.3448029Z Installing : perl-Carp-1.50-458.amzn2023.0.2.noarch 84/107 2025-05-07T19:42:56.3503653Z Installing : perl-Exporter-5.74-459.amzn2023.0.2.noarch 85/107 2025-05-07T19:42:56.3582237Z Installing : perl-NDBM_File-1.15-477.amzn2023.0.6.x86_64 86/107 2025-05-07T19:42:56.4120818Z Installing : perl-PathTools-3.78-459.amzn2023.0.2.x86_64 87/107 2025-05-07T19:42:56.5096947Z Installing : perl-Encode-4:3.15-462.amzn2023.0.2.x86_64 88/107 2025-05-07T19:42:56.5229251Z Installing : perl-libs-4:5.32.1-477.amzn2023.0.6.x86_64 89/107 2025-05-07T19:42:56.5317432Z Installing : perl-interpreter-4:5.32.1-477.amzn2023.0.6.x86_6 90/107 2025-05-07T19:42:56.5383031Z Installing : perl-Error-1:0.17029-5.amzn2023.0.2.noarch 91/107 2025-05-07T19:42:56.5450757Z Installing : perl-File-Find-1.37-477.amzn2023.0.6.noarch 92/107 2025-05-07T19:42:56.5521216Z Installing : perl-TermReadKey-2.38-9.amzn2023.0.2.x86_64 93/107 2025-05-07T19:42:56.5575251Z Installing : perl-lib-0.65-477.amzn2023.0.6.x86_64 94/107 2025-05-07T19:42:56.5638781Z Installing : perl-Git-2.47.1-1.amzn2023.0.2.noarch 95/107 2025-05-07T19:42:56.5708763Z Installing : git-2.47.1-1.amzn2023.0.2.x86_64 96/107 2025-05-07T19:42:56.5919589Z Installing : elfutils-debuginfod-client-0.188-3.amzn2023.0.2. 97/107 2025-05-07T19:42:56.6053857Z Installing : cyrus-sasl-lib-2.1.27-18.amzn2023.0.3.x86_64 98/107 2025-05-07T19:42:56.6133498Z Installing : openldap-2.4.57-6.amzn2023.0.7.x86_64 99/107 2025-05-07T19:42:56.6538086Z Installing : sudo-python-plugin-1.9.15-1.p5.amzn2023.0.1.x86_ 100/107 2025-05-07T19:42:56.7768299Z Installing : sudo-1.9.15-1.p5.amzn2023.0.1.x86_64 101/107 2025-05-07T19:42:56.7862798Z Installing : binutils-2.41-50.amzn2023.0.3.x86_64 102/107 2025-05-07T19:42:56.7972988Z Running scriptlet: binutils-2.41-50.amzn2023.0.3.x86_64 102/107 2025-05-07T19:42:56.8272088Z Installing : pciutils-3.7.0-3.amzn2023.0.2.x86_64 103/107 2025-05-07T19:42:56.8374070Z Installing : wget-1.21.3-1.amzn2023.0.4.x86_64 104/107 2025-05-07T19:42:56.8618569Z Installing : which-2.21-26.amzn2023.0.2.x86_64 105/107 2025-05-07T19:42:56.8838581Z Installing : tar-2:1.34-1.amzn2023.0.4.x86_64 106/107 2025-05-07T19:42:56.8928095Z Installing : findutils-1:4.8.0-2.amzn2023.0.2.x86_64 107/107 2025-05-07T19:42:56.9049388Z Running scriptlet: pam-1.5.1-8.amzn2023.0.4.x86_64 107/107 2025-05-07T19:42:57.7287357Z Running scriptlet: findutils-1:4.8.0-2.amzn2023.0.2.x86_64 107/107 2025-05-07T19:42:57.7288105Z Verifying : binutils-2.41-50.amzn2023.0.3.x86_64 1/107 2025-05-07T19:42:57.7288872Z Verifying : cracklib-2.9.6-27.amzn2023.0.2.x86_64 2/107 2025-05-07T19:42:57.7289559Z Verifying : cyrus-sasl-lib-2.1.27-18.amzn2023.0.3.x86_64 3/107 2025-05-07T19:42:57.7290234Z Verifying : elfutils-debuginfod-client-0.188-3.amzn2023.0.2. 4/107 2025-05-07T19:42:57.7290929Z Verifying : findutils-1:4.8.0-2.amzn2023.0.2.x86_64 5/107 2025-05-07T19:42:57.7291566Z Verifying : git-2.47.1-1.amzn2023.0.2.x86_64 6/107 2025-05-07T19:42:57.7292308Z Verifying : git-core-2.47.1-1.amzn2023.0.2.x86_64 7/107 2025-05-07T19:42:57.7292993Z Verifying : git-core-doc-2.47.1-1.amzn2023.0.2.noarch 8/107 2025-05-07T19:42:57.7294038Z Verifying : gnutls-3.8.3-6.amzn2023.0.1.x86_64 9/107 2025-05-07T19:42:57.7294616Z Verifying : groff-base-1.22.4-7.amzn2023.0.2.x86_64 10/107 2025-05-07T19:42:57.7295341Z Verifying : gzip-1.12-1.amzn2023.0.1.x86_64 11/107 2025-05-07T19:42:57.7295926Z Verifying : hwdata-0.384-1.amzn2023.0.3.noarch 12/107 2025-05-07T19:42:57.7296547Z Verifying : jansson-2.14-0.amzn2023.x86_64 13/107 2025-05-07T19:42:57.7297201Z Verifying : kmod-libs-29-2.amzn2023.0.5.x86_64 14/107 2025-05-07T19:42:57.7297838Z Verifying : less-608-2.amzn2023.0.2.x86_64 15/107 2025-05-07T19:42:57.7298452Z Verifying : libcbor-0.7.0-3.amzn2023.0.2.x86_64 16/107 2025-05-07T19:42:57.7299108Z Verifying : libdb-5.3.28-49.amzn2023.0.2.x86_64 17/107 2025-05-07T19:42:57.7300077Z Verifying : libeconf-0.4.0-1.amzn2023.0.3.x86_64 18/107 2025-05-07T19:42:57.7301318Z Verifying : libedit-3.1-38.20210714cvs.amzn2023.0.2.x86_64 19/107 2025-05-07T19:42:57.7302066Z Verifying : libfdisk-2.37.4-1.amzn2023.0.4.x86_64 20/107 2025-05-07T19:42:57.7302726Z Verifying : libfido2-1.10.0-2.amzn2023.0.2.x86_64 21/107 2025-05-07T19:42:57.7303356Z Verifying : libmetalink-0.1.3-14.amzn2023.0.2.x86_64 22/107 2025-05-07T19:42:57.7304068Z Verifying : libpwquality-1.4.4-6.amzn2023.0.2.x86_64 23/107 2025-05-07T19:42:57.7304700Z Verifying : libsemanage-3.4-5.amzn2023.0.2.x86_64 24/107 2025-05-07T19:42:57.7305389Z Verifying : libutempter-1.2.1-4.amzn2023.0.2.x86_64 25/107 2025-05-07T19:42:57.7306193Z Verifying : nano-8.3-1.amzn2023.x86_64 26/107 2025-05-07T19:42:57.7306801Z Verifying : nano-default-editor-8.3-1.amzn2023.noarch 27/107 2025-05-07T19:42:57.7307486Z Verifying : ncurses-6.2-4.20200222.amzn2023.0.6.x86_64 28/107 2025-05-07T19:42:57.7308168Z Verifying : nettle-3.10.1-1.amzn2023.0.1.x86_64 29/107 2025-05-07T19:42:57.7308806Z Verifying : openldap-2.4.57-6.amzn2023.0.7.x86_64 30/107 2025-05-07T19:42:57.7309554Z Verifying : openssh-8.7p1-8.amzn2023.0.14.x86_64 31/107 2025-05-07T19:42:57.7310221Z Verifying : openssh-clients-8.7p1-8.amzn2023.0.14.x86_64 32/107 2025-05-07T19:42:57.7310890Z Verifying : pam-1.5.1-8.amzn2023.0.4.x86_64 33/107 2025-05-07T19:42:57.7311474Z Verifying : pciutils-3.7.0-3.amzn2023.0.2.x86_64 34/107 2025-05-07T19:42:57.7312160Z Verifying : pciutils-libs-3.7.0-3.amzn2023.0.2.x86_64 35/107 2025-05-07T19:42:57.7312839Z Verifying : perl-AutoLoader-5.74-477.amzn2023.0.6.noarch 36/107 2025-05-07T19:42:57.7313647Z Verifying : perl-B-1.80-477.amzn2023.0.6.x86_64 37/107 2025-05-07T19:42:57.7314314Z Verifying : perl-Carp-1.50-458.amzn2023.0.2.noarch 38/107 2025-05-07T19:42:57.7314942Z Verifying : perl-Class-Struct-0.66-477.amzn2023.0.6.noarch 39/107 2025-05-07T19:42:57.7315622Z Verifying : perl-Data-Dumper-2.174-460.amzn2023.0.2.x86_64 40/107 2025-05-07T19:42:57.7316194Z Verifying : perl-Digest-1.20-1.amzn2023.0.2.noarch 41/107 2025-05-07T19:42:57.7316760Z Verifying : perl-Digest-MD5-2.58-2.amzn2023.0.2.x86_64 42/107 2025-05-07T19:42:57.7317345Z Verifying : perl-DynaLoader-1.47-477.amzn2023.0.6.x86_64 43/107 2025-05-07T19:42:57.7317909Z Verifying : perl-Encode-4:3.15-462.amzn2023.0.2.x86_64 44/107 2025-05-07T19:42:57.7318502Z Verifying : perl-Errno-1.30-477.amzn2023.0.6.x86_64 45/107 2025-05-07T19:42:57.7319329Z Verifying : perl-Error-1:0.17029-5.amzn2023.0.2.noarch 46/107 2025-05-07T19:42:57.7319905Z Verifying : perl-Exporter-5.74-459.amzn2023.0.2.noarch 47/107 2025-05-07T19:42:57.7320459Z Verifying : perl-Fcntl-1.13-477.amzn2023.0.6.x86_64 48/107 2025-05-07T19:42:57.7321053Z Verifying : perl-File-Basename-2.85-477.amzn2023.0.6.noarch 49/107 2025-05-07T19:42:57.7321642Z Verifying : perl-File-Find-1.37-477.amzn2023.0.6.noarch 50/107 2025-05-07T19:42:57.7322239Z Verifying : perl-File-Path-2.18-2.amzn2023.0.2.noarch 51/107 2025-05-07T19:42:57.7322824Z Verifying : perl-File-Temp-1:0.231.100-2.amzn2023.0.2.noarch 52/107 2025-05-07T19:42:57.7323378Z Verifying : perl-File-stat-1.09-477.amzn2023.0.6.noarch 53/107 2025-05-07T19:42:57.7323973Z Verifying : perl-FileHandle-2.03-477.amzn2023.0.6.noarch 54/107 2025-05-07T19:42:57.7324544Z Verifying : perl-Getopt-Long-1:2.52-2.amzn2023.0.2.noarch 55/107 2025-05-07T19:42:57.7325264Z Verifying : perl-Getopt-Std-1.12-477.amzn2023.0.6.noarch 56/107 2025-05-07T19:42:57.7325813Z Verifying : perl-Git-2.47.1-1.amzn2023.0.2.noarch 57/107 2025-05-07T19:42:57.7326378Z Verifying : perl-HTTP-Tiny-0.078-1.amzn2023.0.3.noarch 58/107 2025-05-07T19:42:57.7326938Z Verifying : perl-IO-1.43-477.amzn2023.0.6.x86_64 59/107 2025-05-07T19:42:57.7327473Z Verifying : perl-IO-Socket-IP-0.41-3.amzn2023.0.2.noarch 60/107 2025-05-07T19:42:57.7328051Z Verifying : perl-IO-Socket-SSL-2.075-1.amzn2023.0.2.noarch 61/107 2025-05-07T19:42:57.7328607Z Verifying : perl-IPC-Open3-1.21-477.amzn2023.0.6.noarch 62/107 2025-05-07T19:42:57.7329174Z Verifying : perl-MIME-Base64-3.16-2.amzn2023.0.2.x86_64 63/107 2025-05-07T19:42:57.7329750Z Verifying : perl-Mozilla-CA-20200520-4.amzn2023.0.2.noarch 64/107 2025-05-07T19:42:57.7330295Z Verifying : perl-NDBM_File-1.15-477.amzn2023.0.6.x86_64 65/107 2025-05-07T19:42:57.7330835Z Verifying : perl-Net-SSLeay-1.94-1.amzn2023.0.1.x86_64 66/107 2025-05-07T19:42:57.7331366Z Verifying : perl-POSIX-1.94-477.amzn2023.0.6.x86_64 67/107 2025-05-07T19:42:57.7331929Z Verifying : perl-PathTools-3.78-459.amzn2023.0.2.x86_64 68/107 2025-05-07T19:42:57.7332468Z Verifying : perl-Pod-Escapes-1:1.07-458.amzn2023.0.2.noarch 69/107 2025-05-07T19:42:57.7333038Z Verifying : perl-Pod-Perldoc-3.28.01-459.amzn2023.0.3.noarch 70/107 2025-05-07T19:42:57.7333606Z Verifying : perl-Pod-Simple-1:3.42-2.amzn2023.0.2.noarch 71/107 2025-05-07T19:42:57.7334134Z Verifying : perl-Pod-Usage-4:2.01-2.amzn2023.0.2.noarch 72/107 2025-05-07T19:42:57.7334690Z Verifying : perl-Scalar-List-Utils-4:1.56-459.amzn2023.0.2.x 73/107 2025-05-07T19:42:57.7336189Z Verifying : perl-SelectSaver-1.02-477.amzn2023.0.6.noarch 74/107 2025-05-07T19:42:57.7336754Z Verifying : perl-Socket-4:2.032-1.amzn2023.0.2.x86_64 75/107 2025-05-07T19:42:57.7337278Z Verifying : perl-Storable-1:3.21-458.amzn2023.0.2.x86_64 76/107 2025-05-07T19:42:57.7337832Z Verifying : perl-Symbol-1.08-477.amzn2023.0.6.noarch 77/107 2025-05-07T19:42:57.7338407Z Verifying : perl-Term-ANSIColor-5.01-459.amzn2023.0.2.noarch 78/107 2025-05-07T19:42:57.7338968Z Verifying : perl-Term-Cap-1.17-458.amzn2023.0.2.noarch 79/107 2025-05-07T19:42:57.7339671Z Verifying : perl-TermReadKey-2.38-9.amzn2023.0.2.x86_64 80/107 2025-05-07T19:42:57.7340419Z Verifying : perl-Text-ParseWords-3.30-458.amzn2023.0.2.noarc 81/107 2025-05-07T19:42:57.7341080Z Verifying : perl-Text-Tabs+Wrap-2021.0726-1.amzn2023.0.1.noa 82/107 2025-05-07T19:42:57.7341660Z Verifying : perl-Time-Local-2:1.300-5.amzn2023.0.2.noarch 83/107 2025-05-07T19:42:57.7342287Z Verifying : perl-URI-5.09-1.amzn2023.0.2.noarch 84/107 2025-05-07T19:42:57.7342873Z Verifying : perl-base-2.27-477.amzn2023.0.6.noarch 85/107 2025-05-07T19:42:57.7343423Z Verifying : perl-constant-1.33-459.amzn2023.0.2.noarch 86/107 2025-05-07T19:42:57.7343984Z Verifying : perl-if-0.60.800-477.amzn2023.0.6.noarch 87/107 2025-05-07T19:42:57.7344552Z Verifying : perl-interpreter-4:5.32.1-477.amzn2023.0.6.x86_6 88/107 2025-05-07T19:42:57.7345095Z Verifying : perl-lib-0.65-477.amzn2023.0.6.x86_64 89/107 2025-05-07T19:42:57.7345671Z Verifying : perl-libnet-3.13-2.amzn2023.0.2.noarch 90/107 2025-05-07T19:42:57.7346314Z Verifying : perl-libs-4:5.32.1-477.amzn2023.0.6.x86_64 91/107 2025-05-07T19:42:57.7346822Z Verifying : perl-mro-1.23-477.amzn2023.0.6.x86_64 92/107 2025-05-07T19:42:57.7347336Z Verifying : perl-overload-1.31-477.amzn2023.0.6.noarch 93/107 2025-05-07T19:42:57.7347892Z Verifying : perl-overloading-0.02-477.amzn2023.0.6.noarch 94/107 2025-05-07T19:42:57.7348433Z Verifying : perl-parent-1:0.238-458.amzn2023.0.2.noarch 95/107 2025-05-07T19:42:57.7348941Z Verifying : perl-podlators-1:4.14-458.amzn2023.0.2.noarch 96/107 2025-05-07T19:42:57.7349469Z Verifying : perl-subs-1.03-477.amzn2023.0.6.noarch 97/107 2025-05-07T19:42:57.7349971Z Verifying : perl-vars-1.05-477.amzn2023.0.6.noarch 98/107 2025-05-07T19:42:57.7350488Z Verifying : shadow-utils-2:4.9-12.amzn2023.0.4.x86_64 99/107 2025-05-07T19:42:57.7350995Z Verifying : sudo-1.9.15-1.p5.amzn2023.0.1.x86_64 100/107 2025-05-07T19:42:57.7351508Z Verifying : sudo-python-plugin-1.9.15-1.p5.amzn2023.0.1.x86_ 101/107 2025-05-07T19:42:57.7352057Z Verifying : systemd-libs-252.23-3.amzn2023.x86_64 102/107 2025-05-07T19:42:57.7352542Z Verifying : tar-2:1.34-1.amzn2023.0.4.x86_64 103/107 2025-05-07T19:42:57.7353046Z Verifying : util-linux-2.37.4-1.amzn2023.0.4.x86_64 104/107 2025-05-07T19:42:57.7353557Z Verifying : util-linux-core-2.37.4-1.amzn2023.0.4.x86_64 105/107 2025-05-07T19:42:57.7354074Z Verifying : wget-1.21.3-1.amzn2023.0.4.x86_64 106/107 2025-05-07T19:42:57.8347028Z Verifying : which-2.21-26.amzn2023.0.2.x86_64 107/107 2025-05-07T19:42:57.8347479Z 2025-05-07T19:42:57.8347576Z Installed: 2025-05-07T19:42:57.8347928Z binutils-2.41-50.amzn2023.0.3.x86_64 2025-05-07T19:42:57.8348495Z cracklib-2.9.6-27.amzn2023.0.2.x86_64 2025-05-07T19:42:57.8349073Z cyrus-sasl-lib-2.1.27-18.amzn2023.0.3.x86_64 2025-05-07T19:42:57.8350019Z elfutils-debuginfod-client-0.188-3.amzn2023.0.2.x86_64 2025-05-07T19:42:57.8350619Z findutils-1:4.8.0-2.amzn2023.0.2.x86_64 2025-05-07T19:42:57.8351131Z git-2.47.1-1.amzn2023.0.2.x86_64 2025-05-07T19:42:57.8351654Z git-core-2.47.1-1.amzn2023.0.2.x86_64 2025-05-07T19:42:57.8352193Z git-core-doc-2.47.1-1.amzn2023.0.2.noarch 2025-05-07T19:42:57.8352746Z gnutls-3.8.3-6.amzn2023.0.1.x86_64 2025-05-07T19:42:57.8353289Z groff-base-1.22.4-7.amzn2023.0.2.x86_64 2025-05-07T19:42:57.8353806Z gzip-1.12-1.amzn2023.0.1.x86_64 2025-05-07T19:42:57.8354344Z hwdata-0.384-1.amzn2023.0.3.noarch 2025-05-07T19:42:57.8355078Z jansson-2.14-0.amzn2023.x86_64 2025-05-07T19:42:57.8355630Z kmod-libs-29-2.amzn2023.0.5.x86_64 2025-05-07T19:42:57.8356174Z less-608-2.amzn2023.0.2.x86_64 2025-05-07T19:42:57.8356691Z libcbor-0.7.0-3.amzn2023.0.2.x86_64 2025-05-07T19:42:57.8357237Z libdb-5.3.28-49.amzn2023.0.2.x86_64 2025-05-07T19:42:57.8357767Z libeconf-0.4.0-1.amzn2023.0.3.x86_64 2025-05-07T19:42:57.8358353Z libedit-3.1-38.20210714cvs.amzn2023.0.2.x86_64 2025-05-07T19:42:57.8358906Z libfdisk-2.37.4-1.amzn2023.0.4.x86_64 2025-05-07T19:42:57.8359487Z libfido2-1.10.0-2.amzn2023.0.2.x86_64 2025-05-07T19:42:57.8360049Z libmetalink-0.1.3-14.amzn2023.0.2.x86_64 2025-05-07T19:42:57.8360620Z libpwquality-1.4.4-6.amzn2023.0.2.x86_64 2025-05-07T19:42:57.8361192Z libsemanage-3.4-5.amzn2023.0.2.x86_64 2025-05-07T19:42:57.8361744Z libutempter-1.2.1-4.amzn2023.0.2.x86_64 2025-05-07T19:42:57.8362290Z nano-8.3-1.amzn2023.x86_64 2025-05-07T19:42:57.8362977Z nano-default-editor-8.3-1.amzn2023.noarch 2025-05-07T19:42:57.8363642Z ncurses-6.2-4.20200222.amzn2023.0.6.x86_64 2025-05-07T19:42:57.8364163Z nettle-3.10.1-1.amzn2023.0.1.x86_64 2025-05-07T19:42:57.8364658Z openldap-2.4.57-6.amzn2023.0.7.x86_64 2025-05-07T19:42:57.8365183Z openssh-8.7p1-8.amzn2023.0.14.x86_64 2025-05-07T19:42:57.8365715Z openssh-clients-8.7p1-8.amzn2023.0.14.x86_64 2025-05-07T19:42:57.8366242Z pam-1.5.1-8.amzn2023.0.4.x86_64 2025-05-07T19:42:57.8366750Z pciutils-3.7.0-3.amzn2023.0.2.x86_64 2025-05-07T19:42:57.8367274Z pciutils-libs-3.7.0-3.amzn2023.0.2.x86_64 2025-05-07T19:42:57.8367852Z perl-AutoLoader-5.74-477.amzn2023.0.6.noarch 2025-05-07T19:42:57.8368384Z perl-B-1.80-477.amzn2023.0.6.x86_64 2025-05-07T19:42:57.8368920Z perl-Carp-1.50-458.amzn2023.0.2.noarch 2025-05-07T19:42:57.8369474Z perl-Class-Struct-0.66-477.amzn2023.0.6.noarch 2025-05-07T19:42:57.8370062Z perl-Data-Dumper-2.174-460.amzn2023.0.2.x86_64 2025-05-07T19:42:57.8370727Z perl-Digest-1.20-1.amzn2023.0.2.noarch 2025-05-07T19:42:57.8371262Z perl-Digest-MD5-2.58-2.amzn2023.0.2.x86_64 2025-05-07T19:42:57.8371846Z perl-DynaLoader-1.47-477.amzn2023.0.6.x86_64 2025-05-07T19:42:57.8372384Z perl-Encode-4:3.15-462.amzn2023.0.2.x86_64 2025-05-07T19:42:57.8372927Z perl-Errno-1.30-477.amzn2023.0.6.x86_64 2025-05-07T19:42:57.8373457Z perl-Error-1:0.17029-5.amzn2023.0.2.noarch 2025-05-07T19:42:57.8373982Z perl-Exporter-5.74-459.amzn2023.0.2.noarch 2025-05-07T19:42:57.8374531Z perl-Fcntl-1.13-477.amzn2023.0.6.x86_64 2025-05-07T19:42:57.8375068Z perl-File-Basename-2.85-477.amzn2023.0.6.noarch 2025-05-07T19:42:57.8375637Z perl-File-Find-1.37-477.amzn2023.0.6.noarch 2025-05-07T19:42:57.8376234Z perl-File-Path-2.18-2.amzn2023.0.2.noarch 2025-05-07T19:42:57.8376785Z perl-File-Temp-1:0.231.100-2.amzn2023.0.2.noarch 2025-05-07T19:42:57.8377335Z perl-File-stat-1.09-477.amzn2023.0.6.noarch 2025-05-07T19:42:57.8377878Z perl-FileHandle-2.03-477.amzn2023.0.6.noarch 2025-05-07T19:42:57.8378439Z perl-Getopt-Long-1:2.52-2.amzn2023.0.2.noarch 2025-05-07T19:42:57.8378969Z perl-Getopt-Std-1.12-477.amzn2023.0.6.noarch 2025-05-07T19:42:57.8379784Z perl-Git-2.47.1-1.amzn2023.0.2.noarch 2025-05-07T19:42:57.8380533Z perl-HTTP-Tiny-0.078-1.amzn2023.0.3.noarch 2025-05-07T19:42:57.8381099Z perl-IO-1.43-477.amzn2023.0.6.x86_64 2025-05-07T19:42:57.8381705Z perl-IO-Socket-IP-0.41-3.amzn2023.0.2.noarch 2025-05-07T19:42:57.8382311Z perl-IO-Socket-SSL-2.075-1.amzn2023.0.2.noarch 2025-05-07T19:42:57.8382951Z perl-IPC-Open3-1.21-477.amzn2023.0.6.noarch 2025-05-07T19:42:57.8383544Z perl-MIME-Base64-3.16-2.amzn2023.0.2.x86_64 2025-05-07T19:42:57.8384179Z perl-Mozilla-CA-20200520-4.amzn2023.0.2.noarch 2025-05-07T19:42:57.8384801Z perl-NDBM_File-1.15-477.amzn2023.0.6.x86_64 2025-05-07T19:42:57.8385377Z perl-Net-SSLeay-1.94-1.amzn2023.0.1.x86_64 2025-05-07T19:42:57.8385973Z perl-POSIX-1.94-477.amzn2023.0.6.x86_64 2025-05-07T19:42:57.8386553Z perl-PathTools-3.78-459.amzn2023.0.2.x86_64 2025-05-07T19:42:57.8387170Z perl-Pod-Escapes-1:1.07-458.amzn2023.0.2.noarch 2025-05-07T19:42:57.8387764Z perl-Pod-Perldoc-3.28.01-459.amzn2023.0.3.noarch 2025-05-07T19:42:57.8388367Z perl-Pod-Simple-1:3.42-2.amzn2023.0.2.noarch 2025-05-07T19:42:57.8388964Z perl-Pod-Usage-4:2.01-2.amzn2023.0.2.noarch 2025-05-07T19:42:57.8389539Z perl-Scalar-List-Utils-4:1.56-459.amzn2023.0.2.x86_64 2025-05-07T19:42:57.8390171Z perl-SelectSaver-1.02-477.amzn2023.0.6.noarch 2025-05-07T19:42:57.8390749Z perl-Socket-4:2.032-1.amzn2023.0.2.x86_64 2025-05-07T19:42:57.8391309Z perl-Storable-1:3.21-458.amzn2023.0.2.x86_64 2025-05-07T19:42:57.8391895Z perl-Symbol-1.08-477.amzn2023.0.6.noarch 2025-05-07T19:42:57.8392653Z perl-Term-ANSIColor-5.01-459.amzn2023.0.2.noarch 2025-05-07T19:42:57.8393217Z perl-Term-Cap-1.17-458.amzn2023.0.2.noarch 2025-05-07T19:42:57.8393750Z perl-TermReadKey-2.38-9.amzn2023.0.2.x86_64 2025-05-07T19:42:57.8394326Z perl-Text-ParseWords-3.30-458.amzn2023.0.2.noarch 2025-05-07T19:42:57.8394892Z perl-Text-Tabs+Wrap-2021.0726-1.amzn2023.0.1.noarch 2025-05-07T19:42:57.8395454Z perl-Time-Local-2:1.300-5.amzn2023.0.2.noarch 2025-05-07T19:42:57.8395982Z perl-URI-5.09-1.amzn2023.0.2.noarch 2025-05-07T19:42:57.8396487Z perl-base-2.27-477.amzn2023.0.6.noarch 2025-05-07T19:42:57.8397034Z perl-constant-1.33-459.amzn2023.0.2.noarch 2025-05-07T19:42:57.8397554Z perl-if-0.60.800-477.amzn2023.0.6.noarch 2025-05-07T19:42:57.8398168Z perl-interpreter-4:5.32.1-477.amzn2023.0.6.x86_64 2025-05-07T19:42:57.8398673Z perl-lib-0.65-477.amzn2023.0.6.x86_64 2025-05-07T19:42:57.8399204Z perl-libnet-3.13-2.amzn2023.0.2.noarch 2025-05-07T19:42:57.8399731Z perl-libs-4:5.32.1-477.amzn2023.0.6.x86_64 2025-05-07T19:42:57.8400417Z perl-mro-1.23-477.amzn2023.0.6.x86_64 2025-05-07T19:42:57.8401416Z perl-overload-1.31-477.amzn2023.0.6.noarch 2025-05-07T19:42:57.8402078Z perl-overloading-0.02-477.amzn2023.0.6.noarch 2025-05-07T19:42:57.8402680Z perl-parent-1:0.238-458.amzn2023.0.2.noarch 2025-05-07T19:42:57.8403267Z perl-podlators-1:4.14-458.amzn2023.0.2.noarch 2025-05-07T19:42:57.8403863Z perl-subs-1.03-477.amzn2023.0.6.noarch 2025-05-07T19:42:57.8404433Z perl-vars-1.05-477.amzn2023.0.6.noarch 2025-05-07T19:42:57.8404985Z shadow-utils-2:4.9-12.amzn2023.0.4.x86_64 2025-05-07T19:42:57.8405533Z sudo-1.9.15-1.p5.amzn2023.0.1.x86_64 2025-05-07T19:42:57.8406080Z sudo-python-plugin-1.9.15-1.p5.amzn2023.0.1.x86_64 2025-05-07T19:42:57.8406720Z systemd-libs-252.23-3.amzn2023.x86_64 2025-05-07T19:42:57.8407258Z tar-2:1.34-1.amzn2023.0.4.x86_64 2025-05-07T19:42:57.8407771Z util-linux-2.37.4-1.amzn2023.0.4.x86_64 2025-05-07T19:42:57.8408341Z util-linux-core-2.37.4-1.amzn2023.0.4.x86_64 2025-05-07T19:42:57.8408880Z wget-1.21.3-1.amzn2023.0.4.x86_64 2025-05-07T19:42:57.8409417Z which-2.21-26.amzn2023.0.2.x86_64 2025-05-07T19:42:57.8409732Z 2025-05-07T19:42:57.8409840Z Complete! 2025-05-07T19:42:57.9116227Z ##[group]Run actions/checkout@v4 2025-05-07T19:42:57.9116602Z with: 2025-05-07T19:42:57.9116827Z submodules: true 2025-05-07T19:42:57.9117117Z repository: pytorch/FBGEMM 2025-05-07T19:42:57.9117613Z token: *** 2025-05-07T19:42:57.9117838Z ssh-strict: true 2025-05-07T19:42:57.9118099Z ssh-user: git 2025-05-07T19:42:57.9118347Z persist-credentials: true 2025-05-07T19:42:57.9118649Z clean: true 2025-05-07T19:42:57.9118892Z sparse-checkout-cone-mode: true 2025-05-07T19:42:57.9119206Z fetch-depth: 1 2025-05-07T19:42:57.9119437Z fetch-tags: false 2025-05-07T19:42:57.9119693Z show-progress: true 2025-05-07T19:42:57.9119928Z lfs: false 2025-05-07T19:42:57.9120176Z set-safe-directory: true 2025-05-07T19:42:57.9120628Z env: 2025-05-07T19:42:57.9120891Z PRELUDE: .github/scripts/setup_env.bash 2025-05-07T19:42:57.9121242Z BUILD_ENV: build_binary 2025-05-07T19:42:57.9121495Z BUILD_TARGET: genai 2025-05-07T19:42:57.9121818Z BUILD_VARIANT: cuda 2025-05-07T19:42:57.9122149Z BUILD_CUDA_VERSION: 12.8.0 2025-05-07T19:42:57.9122440Z ##[endgroup] 2025-05-07T19:42:57.9167059Z ##[command]/usr/bin/docker exec 565b81b7c816cbdd14afbfa510e3c8636c8644acf5a2e5045d5b002a6b1a6184 sh -c "cat /etc/*release | grep ^ID" 2025-05-07T19:42:58.2116682Z Syncing repository: pytorch/FBGEMM 2025-05-07T19:42:58.2118089Z ##[group]Getting Git version info 2025-05-07T19:42:58.2118458Z Working directory is '/__w/FBGEMM/FBGEMM' 2025-05-07T19:42:58.2118990Z [command]/usr/bin/git version 2025-05-07T19:42:58.2119284Z git version 2.47.1 2025-05-07T19:42:58.2120247Z ##[endgroup] 2025-05-07T19:42:58.2141460Z Temporarily overriding HOME='/__w/_temp/d6e4e2f3-f586-47a9-95e2-7c539a524b10' before making global git config changes 2025-05-07T19:42:58.2142314Z Adding repository directory to the temporary git global config as a safe directory 2025-05-07T19:42:58.2143348Z [command]/usr/bin/git config --global --add safe.directory /__w/FBGEMM/FBGEMM 2025-05-07T19:42:58.2174982Z [command]/usr/bin/git config --local --get remote.origin.url 2025-05-07T19:42:58.2192345Z https://github.com/pytorch/FBGEMM 2025-05-07T19:42:58.2206251Z ##[group]Removing previously created refs, to avoid conflicts 2025-05-07T19:42:58.2208900Z [command]/usr/bin/git rev-parse --symbolic-full-name --verify --quiet HEAD 2025-05-07T19:42:58.2227522Z HEAD 2025-05-07T19:42:58.2263285Z ##[endgroup] 2025-05-07T19:42:58.2264053Z [command]/usr/bin/git submodule status 2025-05-07T19:42:58.2682584Z e5d7c0bd5d9aec44d68830187138149e6a8c4e32 external/asmjit (e5d7c0b) 2025-05-07T19:42:58.2824558Z 4a61bdd4bd4ed730e078aebc7c0fcf046ff29406 external/composable_kernel (4a61bdd) 2025-05-07T19:42:58.2948454Z 6543fec09b2f04ac4a666882998b534afc9c1349 external/cpuinfo (6543fec) 2025-05-07T19:42:58.3102307Z 3ed8d2ec4ba35ef5d9d8353826209b6f868f63d3 external/cutlass (3ed8d2e) 2025-05-07T19:42:58.3245305Z f8d7d77c06936315286eb55f8de22cd23c188571 external/googletest (f8d7d77) 2025-05-07T19:42:58.3373380Z 420084499c7c1e1c2d801922f40df202eac5f3a0 external/hipify_torch (4200844) 2025-05-07T19:42:58.3518567Z 9cca280a4d0ccf0c08f47a99aa71d1b0e52f8d03 external/json (9cca280) 2025-05-07T19:42:58.3523555Z ##[group]Cleaning the repository 2025-05-07T19:42:58.3526319Z [command]/usr/bin/git clean -ffdx 2025-05-07T19:42:58.4201208Z Removing build_only/ 2025-05-07T19:42:58.4202074Z Removing collect_env.py 2025-05-07T19:42:58.4202905Z Removing fbgemm_gpu/_skbuild/ 2025-05-07T19:42:58.4203900Z Removing fbgemm_gpu/codegen/genscript/__pycache__/ 2025-05-07T19:42:58.4204997Z Removing fbgemm_gpu/dist/ 2025-05-07T19:42:58.4205923Z Removing fbgemm_gpu/fbgemm_gpu/docs/version.py 2025-05-07T19:42:58.4207102Z Removing fbgemm_gpu/fbgemm_gpu_nightly.egg-info/ 2025-05-07T19:42:58.4207994Z [command]/usr/bin/git reset --hard HEAD 2025-05-07T19:42:58.5782546Z HEAD is now at 1c9ad64 Merge f6528e7b1e8f5602e7dba30cd73b48ae6630981c into fd4df5f456e0cca514bacd98a39efb72990fd9f4 2025-05-07T19:42:58.5786159Z ##[endgroup] 2025-05-07T19:42:58.5787398Z ##[group]Disabling automatic garbage collection 2025-05-07T19:42:58.5790580Z [command]/usr/bin/git config --local gc.auto 0 2025-05-07T19:42:58.5820233Z ##[endgroup] 2025-05-07T19:42:58.5821396Z ##[group]Setting up auth 2025-05-07T19:42:58.5822686Z [command]/usr/bin/git config --local --name-only --get-regexp core\.sshCommand 2025-05-07T19:42:58.5849009Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'core\.sshCommand' && git config --local --unset-all 'core.sshCommand' || :" 2025-05-07T19:42:58.6131921Z Entering 'external/asmjit' 2025-05-07T19:42:58.6180367Z Entering 'external/composable_kernel' 2025-05-07T19:42:58.6232570Z Entering 'external/cpuinfo' 2025-05-07T19:42:58.6283297Z Entering 'external/cutlass' 2025-05-07T19:42:58.6342334Z Entering 'external/googletest' 2025-05-07T19:42:58.6386428Z Entering 'external/hipify_torch' 2025-05-07T19:42:58.6461047Z Entering 'external/json' 2025-05-07T19:42:58.6536734Z [command]/usr/bin/git config --local --name-only --get-regexp http\.https\:\/\/github\.com\/\.extraheader 2025-05-07T19:42:58.6578064Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'http\.https\:\/\/github\.com\/\.extraheader' && git config --local --unset-all 'http.https://github.com/.extraheader' || :" 2025-05-07T19:42:58.6846284Z Entering 'external/asmjit' 2025-05-07T19:42:58.6893787Z Entering 'external/composable_kernel' 2025-05-07T19:42:58.6950966Z Entering 'external/cpuinfo' 2025-05-07T19:42:58.7000045Z Entering 'external/cutlass' 2025-05-07T19:42:58.7071767Z Entering 'external/googletest' 2025-05-07T19:42:58.7124118Z Entering 'external/hipify_torch' 2025-05-07T19:42:58.7189718Z Entering 'external/json' 2025-05-07T19:42:58.7254974Z [command]/usr/bin/git config --local http.https://github.com/.extraheader AUTHORIZATION: basic *** 2025-05-07T19:42:58.7296708Z ##[endgroup] 2025-05-07T19:42:58.7297449Z ##[group]Fetching the repository 2025-05-07T19:42:58.7301190Z [command]/usr/bin/git -c protocol.version=2 fetch --no-tags --prune --no-recurse-submodules --depth=1 origin +a2f4c52051596e74bc8c16e3d2867a4ecdd271e0:refs/remotes/pull/4066/merge 2025-05-07T19:42:58.9221400Z From https://github.com/pytorch/FBGEMM 2025-05-07T19:42:58.9222247Z + 1c9ad64...a2f4c52 a2f4c52051596e74bc8c16e3d2867a4ecdd271e0 -> pull/4066/merge (forced update) 2025-05-07T19:42:58.9242070Z ##[endgroup] 2025-05-07T19:42:58.9242499Z ##[group]Determining the checkout info 2025-05-07T19:42:58.9242994Z ##[endgroup] 2025-05-07T19:42:58.9243522Z [command]/usr/bin/git sparse-checkout disable 2025-05-07T19:42:58.9753192Z [command]/usr/bin/git config --local --unset-all extensions.worktreeConfig 2025-05-07T19:42:58.9780290Z ##[group]Checking out the ref 2025-05-07T19:42:58.9781247Z [command]/usr/bin/git checkout --progress --force refs/remotes/pull/4066/merge 2025-05-07T19:42:59.0765844Z Warning: you are leaving 1 commit behind, not connected to 2025-05-07T19:42:59.0766489Z any of your branches: 2025-05-07T19:42:59.0766704Z 2025-05-07T19:42:59.0767091Z 1c9ad64 Merge f6528e7b1e8f5602e7dba30cd73b48ae6630981c into fd4df5f456e0cca514bacd98a39efb72990fd9f4 2025-05-07T19:42:59.0767592Z 2025-05-07T19:42:59.0767804Z If you want to keep it by creating a new branch, this may be a good time 2025-05-07T19:42:59.0768211Z to do so with: 2025-05-07T19:42:59.0768368Z 2025-05-07T19:42:59.0768506Z git branch 1c9ad64 2025-05-07T19:42:59.0768717Z 2025-05-07T19:42:59.0773658Z HEAD is now at a2f4c52 Merge 6060cd4b5f971680caecdcc657faccb5720d1c3e into fd4df5f456e0cca514bacd98a39efb72990fd9f4 2025-05-07T19:42:59.0779301Z ##[endgroup] 2025-05-07T19:42:59.0779845Z ##[group]Setting up auth for fetching submodules 2025-05-07T19:42:59.0783744Z [command]/usr/bin/git config --global http.https://github.com/.extraheader AUTHORIZATION: basic *** 2025-05-07T19:42:59.0832086Z [command]/usr/bin/git config --global --unset-all url.https://github.com/.insteadOf 2025-05-07T19:42:59.0861449Z [command]/usr/bin/git config --global --add url.https://github.com/.insteadOf git@github.com: 2025-05-07T19:42:59.0889407Z [command]/usr/bin/git config --global --add url.https://github.com/.insteadOf org-21003710@github.com: 2025-05-07T19:42:59.0909134Z ##[endgroup] 2025-05-07T19:42:59.0909666Z ##[group]Fetching submodules 2025-05-07T19:42:59.0911944Z [command]/usr/bin/git submodule sync 2025-05-07T19:42:59.1220659Z Synchronizing submodule url for 'external/asmjit' 2025-05-07T19:42:59.1221386Z Synchronizing submodule url for 'external/composable_kernel' 2025-05-07T19:42:59.1221870Z Synchronizing submodule url for 'external/cpuinfo' 2025-05-07T19:42:59.1222402Z Synchronizing submodule url for 'external/cutlass' 2025-05-07T19:42:59.1223133Z Synchronizing submodule url for 'external/googletest' 2025-05-07T19:42:59.1223582Z Synchronizing submodule url for 'external/hipify_torch' 2025-05-07T19:42:59.1224242Z Synchronizing submodule url for 'external/json' 2025-05-07T19:42:59.1225906Z [command]/usr/bin/git -c protocol.version=2 submodule update --init --force --depth=1 2025-05-07T19:42:59.2592117Z Submodule path 'external/asmjit': checked out 'e5d7c0bd5d9aec44d68830187138149e6a8c4e32' 2025-05-07T19:42:59.5351735Z Submodule path 'external/composable_kernel': checked out '4a61bdd4bd4ed730e078aebc7c0fcf046ff29406' 2025-05-07T19:42:59.6586829Z Submodule path 'external/cpuinfo': checked out '6543fec09b2f04ac4a666882998b534afc9c1349' 2025-05-07T19:43:00.4343888Z Submodule path 'external/cutlass': checked out '3ed8d2ec4ba35ef5d9d8353826209b6f868f63d3' 2025-05-07T19:43:00.4846975Z Submodule path 'external/googletest': checked out 'f8d7d77c06936315286eb55f8de22cd23c188571' 2025-05-07T19:43:00.4979838Z Submodule path 'external/hipify_torch': checked out '420084499c7c1e1c2d801922f40df202eac5f3a0' 2025-05-07T19:43:00.6466188Z Submodule path 'external/json': checked out '9cca280a4d0ccf0c08f47a99aa71d1b0e52f8d03' 2025-05-07T19:43:00.6477660Z [command]/usr/bin/git submodule foreach git config --local gc.auto 0 2025-05-07T19:43:00.6765755Z Entering 'external/asmjit' 2025-05-07T19:43:00.6786116Z Entering 'external/composable_kernel' 2025-05-07T19:43:00.6821051Z Entering 'external/cpuinfo' 2025-05-07T19:43:00.6855309Z Entering 'external/cutlass' 2025-05-07T19:43:00.6886461Z Entering 'external/googletest' 2025-05-07T19:43:00.6916631Z Entering 'external/hipify_torch' 2025-05-07T19:43:00.6949560Z Entering 'external/json' 2025-05-07T19:43:00.6986701Z ##[endgroup] 2025-05-07T19:43:00.6987152Z ##[group]Persisting credentials for submodules 2025-05-07T19:43:00.6991712Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'url\.https\:\/\/github\.com\/\.insteadOf' && git config --local --unset-all 'url.https://github.com/.insteadOf' || :" 2025-05-07T19:43:00.7261582Z Entering 'external/asmjit' 2025-05-07T19:43:00.7296137Z url.https://github.com/.insteadof 2025-05-07T19:43:00.7296632Z url.https://github.com/.insteadof 2025-05-07T19:43:00.7341526Z Entering 'external/composable_kernel' 2025-05-07T19:43:00.7375487Z url.https://github.com/.insteadof 2025-05-07T19:43:00.7376195Z url.https://github.com/.insteadof 2025-05-07T19:43:00.7421469Z Entering 'external/cpuinfo' 2025-05-07T19:43:00.7467072Z url.https://github.com/.insteadof 2025-05-07T19:43:00.7468836Z url.https://github.com/.insteadof 2025-05-07T19:43:00.7501678Z Entering 'external/cutlass' 2025-05-07T19:43:00.7536956Z url.https://github.com/.insteadof 2025-05-07T19:43:00.7537617Z url.https://github.com/.insteadof 2025-05-07T19:43:00.7583112Z Entering 'external/googletest' 2025-05-07T19:43:00.7634498Z url.https://github.com/.insteadof 2025-05-07T19:43:00.7635148Z url.https://github.com/.insteadof 2025-05-07T19:43:00.7670469Z Entering 'external/hipify_torch' 2025-05-07T19:43:00.7721032Z url.https://github.com/.insteadof 2025-05-07T19:43:00.7721651Z url.https://github.com/.insteadof 2025-05-07T19:43:00.7758599Z Entering 'external/json' 2025-05-07T19:43:00.7787904Z url.https://github.com/.insteadof 2025-05-07T19:43:00.7788304Z url.https://github.com/.insteadof 2025-05-07T19:43:00.7846576Z [command]/usr/bin/git submodule foreach sh -c "git config --local 'http.https://github.com/.extraheader' 'AUTHORIZATION: basic ***' && git config --local --show-origin --name-only --get-regexp remote.origin.url" 2025-05-07T19:43:00.8121200Z Entering 'external/asmjit' 2025-05-07T19:43:00.8182201Z file:/__w/FBGEMM/FBGEMM/.git/modules/external/asmjit/config remote.origin.url 2025-05-07T19:43:00.8182735Z Entering 'external/composable_kernel' 2025-05-07T19:43:00.8237239Z file:/__w/FBGEMM/FBGEMM/.git/modules/external/composable_kernel/config remote.origin.url 2025-05-07T19:43:00.8239672Z Entering 'external/cpuinfo' 2025-05-07T19:43:00.8292645Z file:/__w/FBGEMM/FBGEMM/.git/modules/external/cpuinfo/config remote.origin.url 2025-05-07T19:43:00.8295164Z Entering 'external/cutlass' 2025-05-07T19:43:00.8346970Z file:/__w/FBGEMM/FBGEMM/.git/modules/external/cutlass/config remote.origin.url 2025-05-07T19:43:00.8352033Z Entering 'external/googletest' 2025-05-07T19:43:00.8401906Z file:/__w/FBGEMM/FBGEMM/.git/modules/external/googletest/config remote.origin.url 2025-05-07T19:43:00.8404240Z Entering 'external/hipify_torch' 2025-05-07T19:43:00.8456050Z file:/__w/FBGEMM/FBGEMM/.git/modules/external/hipify_torch/config remote.origin.url 2025-05-07T19:43:00.8457886Z Entering 'external/json' 2025-05-07T19:43:00.8508652Z file:/__w/FBGEMM/FBGEMM/.git/modules/external/json/config remote.origin.url 2025-05-07T19:43:00.8662426Z [command]/usr/bin/git submodule foreach git config --local --add 'url.https://github.com/.insteadOf' 'git@github.com:' 2025-05-07T19:43:00.8964577Z Entering 'external/asmjit' 2025-05-07T19:43:00.8987253Z Entering 'external/composable_kernel' 2025-05-07T19:43:00.9012243Z Entering 'external/cpuinfo' 2025-05-07T19:43:00.9039802Z Entering 'external/cutlass' 2025-05-07T19:43:00.9062152Z Entering 'external/googletest' 2025-05-07T19:43:00.9096186Z Entering 'external/hipify_torch' 2025-05-07T19:43:00.9131854Z Entering 'external/json' 2025-05-07T19:43:00.9173079Z [command]/usr/bin/git submodule foreach git config --local --add 'url.https://github.com/.insteadOf' 'org-21003710@github.com:' 2025-05-07T19:43:00.9469788Z Entering 'external/asmjit' 2025-05-07T19:43:00.9496915Z Entering 'external/composable_kernel' 2025-05-07T19:43:00.9530500Z Entering 'external/cpuinfo' 2025-05-07T19:43:00.9559614Z Entering 'external/cutlass' 2025-05-07T19:43:00.9588888Z Entering 'external/googletest' 2025-05-07T19:43:00.9611699Z Entering 'external/hipify_torch' 2025-05-07T19:43:00.9642546Z Entering 'external/json' 2025-05-07T19:43:00.9688162Z ##[endgroup] 2025-05-07T19:43:00.9721898Z [command]/usr/bin/git log -1 --format=%H 2025-05-07T19:43:00.9742569Z a2f4c52051596e74bc8c16e3d2867a4ecdd271e0 2025-05-07T19:43:00.9897981Z ##[group]Run . $PRELUDE; print_system_info 2025-05-07T19:43:00.9898390Z . $PRELUDE; print_system_info 2025-05-07T19:43:00.9898874Z shell: bash --noprofile --norc -e -o pipefail {0} 2025-05-07T19:43:00.9899249Z env: 2025-05-07T19:43:00.9899592Z PRELUDE: .github/scripts/setup_env.bash 2025-05-07T19:43:00.9899909Z BUILD_ENV: build_binary 2025-05-07T19:43:00.9900179Z BUILD_TARGET: genai 2025-05-07T19:43:00.9900581Z BUILD_VARIANT: cuda 2025-05-07T19:43:00.9900869Z BUILD_CUDA_VERSION: 12.8.0 2025-05-07T19:43:00.9901127Z ##[endgroup] 2025-05-07T19:43:01.3996762Z ################################################################################ 2025-05-07T19:43:01.3997241Z # Print System Info 2025-05-07T19:43:01.3997551Z # 2025-05-07T19:43:01.4021566Z # [2025-05-07T19:43:01.401Z] + print_system_info 2025-05-07T19:43:01.4021997Z ################################################################################ 2025-05-07T19:43:01.4022298Z 2025-05-07T19:43:01.4022555Z ################################################################################ 2025-05-07T19:43:01.4022947Z [INFO] Printing environment variables ... 2025-05-07T19:43:01.4023353Z + printenv 2025-05-07T19:43:01.4023490Z 2025-05-07T19:43:01.4049579Z GITHUB_WORKSPACE=/__w/FBGEMM/FBGEMM 2025-05-07T19:43:01.4050059Z BUILD_VARIANT=cuda 2025-05-07T19:43:01.4050378Z HOSTNAME=565b81b7c816 2025-05-07T19:43:01.4050911Z GITHUB_PATH=/__w/_temp/_runner_file_commands/add_path_a172128f-4127-4ddc-adaa-06bdbf4febfc 2025-05-07T19:43:01.4051483Z GITHUB_ACTION=__run_2 2025-05-07T19:43:01.4051763Z GITHUB_RUN_NUMBER=10601 2025-05-07T19:43:01.4052090Z RUNNER_NAME=i-061ecfb3f7340882c 2025-05-07T19:43:01.4052461Z GITHUB_REPOSITORY_OWNER_ID=21003710 2025-05-07T19:43:01.4052807Z PLATFORM_NAME_LC=linux-x86_64 2025-05-07T19:43:01.4053135Z MACHINE_NAME_LC=x86_64 2025-05-07T19:43:01.4053417Z GITHUB_TRIGGERING_ACTOR=q10 2025-05-07T19:43:01.4053765Z PRELUDE=.github/scripts/setup_env.bash 2025-05-07T19:43:01.4054107Z GITHUB_REF_TYPE=branch 2025-05-07T19:43:01.4054760Z *** 2025-05-07T19:43:01.4055003Z GITHUB_REPOSITORY_ID=150154628 2025-05-07T19:43:01.4055334Z GITHUB_ACTIONS=true 2025-05-07T19:43:01.4055670Z GITHUB_SHA=a2f4c52051596e74bc8c16e3d2867a4ecdd271e0 2025-05-07T19:43:01.4056545Z GITHUB_WORKFLOW_REF=pytorch/FBGEMM/.github/workflows/fbgemm_gpu_ci_cuda.yml@refs/pull/4066/merge 2025-05-07T19:43:01.4057170Z RUNNER_ENVIRONMENT=self-hosted 2025-05-07T19:43:01.4057483Z GITHUB_REF=refs/pull/4066/merge 2025-05-07T19:43:01.4057801Z RUNNER_OS=Linux 2025-05-07T19:43:01.4058059Z GITHUB_REF_PROTECTED=false 2025-05-07T19:43:01.4058366Z HOME=/github/home 2025-05-07T19:43:01.4058714Z GITHUB_API_URL=https://api.github.com 2025-05-07T19:43:01.4059070Z RUNNER_ARCH=X64 2025-05-07T19:43:01.4059316Z RUNNER_TEMP=/__w/_temp 2025-05-07T19:43:01.4059701Z BUILD_TARGET=genai 2025-05-07T19:43:01.4060185Z GITHUB_STATE=/__w/_temp/_runner_file_commands/save_state_a172128f-4127-4ddc-adaa-06bdbf4febfc 2025-05-07T19:43:01.4060882Z GITHUB_ENV=/__w/_temp/_runner_file_commands/set_env_a172128f-4127-4ddc-adaa-06bdbf4febfc 2025-05-07T19:43:01.4061447Z GITHUB_EVENT_PATH=/github/workflow/event.json 2025-05-07T19:43:01.4061806Z GITHUB_EVENT_NAME=pull_request 2025-05-07T19:43:01.4062135Z GITHUB_RUN_ID=14891846252 2025-05-07T19:43:01.4062648Z GITHUB_STEP_SUMMARY=/__w/_temp/_runner_file_commands/step_summary_a172128f-4127-4ddc-adaa-06bdbf4febfc 2025-05-07T19:43:01.4063224Z BUILD_ENV=build_binary 2025-05-07T19:43:01.4063514Z GITHUB_ACTOR=q10 2025-05-07T19:43:01.4063763Z GITHUB_RUN_ATTEMPT=1 2025-05-07T19:43:01.4064044Z KERN_NAME_LC=linux 2025-05-07T19:43:01.4064301Z BUILD_CUDA_VERSION=12.8.0 2025-05-07T19:43:01.4064668Z GITHUB_GRAPHQL_URL=https://api.github.com/graphql 2025-05-07T19:43:01.4065053Z PLATFORM_NAME=Linux-x86_64 2025-05-07T19:43:01.4065396Z GITHUB_SERVER_URL=https://github.com 2025-05-07T19:43:01.4065712Z SHLVL=1 2025-05-07T19:43:01.4065972Z GITHUB_ACTOR_ID=255046 2025-05-07T19:43:01.4066242Z RUNNER_TOOL_CACHE=/__w/_tool 2025-05-07T19:43:01.4066824Z GITHUB_WORKFLOW_SHA=6060cd4b5f971680caecdcc657faccb5720d1c3e 2025-05-07T19:43:01.4067274Z GITHUB_REF_NAME=4066/merge 2025-05-07T19:43:01.4067550Z KERN_NAME=Linux 2025-05-07T19:43:01.4067840Z GITHUB_JOB=build_artifact 2025-05-07T19:43:01.4068146Z GITHUB_REPOSITORY=pytorch/FBGEMM 2025-05-07T19:43:01.4068494Z GITHUB_RETENTION_DAYS=90 2025-05-07T19:43:01.4068787Z RUNNER_WORKSPACE=/__w/FBGEMM 2025-05-07T19:43:01.4069119Z GITHUB_ACTION_REPOSITORY= 2025-05-07T19:43:01.4069506Z PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin 2025-05-07T19:43:01.4069954Z GITHUB_BASE_REF=main 2025-05-07T19:43:01.4070220Z CI=true 2025-05-07T19:43:01.4070486Z GITHUB_REPOSITORY_OWNER=pytorch 2025-05-07T19:43:01.4070831Z GITHUB_HEAD_REF=bm/genai-rocm-oss-6 2025-05-07T19:43:01.4071146Z GITHUB_ACTION_REF= 2025-05-07T19:43:01.4071463Z GITHUB_WORKFLOW=FBGEMM GPU/GenAI CUDA CI 2025-05-07T19:43:01.4072005Z GITHUB_OUTPUT=/__w/_temp/_runner_file_commands/set_output_a172128f-4127-4ddc-adaa-06bdbf4febfc 2025-05-07T19:43:01.4072550Z MACHINE_NAME=x86_64 2025-05-07T19:43:01.4072858Z _=/usr/bin/printenv 2025-05-07T19:43:01.4073039Z 2025-05-07T19:43:01.4073169Z ################################################################################ 2025-05-07T19:43:01.4073529Z [INFO] Print ldd version ... 2025-05-07T19:43:01.4073845Z + ldd --version 2025-05-07T19:43:01.4073990Z 2025-05-07T19:43:01.4074122Z ldd (GNU libc) 2.34 2025-05-07T19:43:01.4074431Z Copyright (C) 2021 Free Software Foundation, Inc. 2025-05-07T19:43:01.4074949Z This is free software; see the source for copying conditions. There is NO 2025-05-07T19:43:01.4075548Z warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. 2025-05-07T19:43:01.4076076Z Written by Roland McGrath and Ulrich Drepper. 2025-05-07T19:43:01.4076324Z 2025-05-07T19:43:01.4076480Z ################################################################################ 2025-05-07T19:43:01.4076827Z [INFO] Print CPU info ... 2025-05-07T19:43:01.4077125Z + nproc 2025-05-07T19:43:01.4077247Z 2025-05-07T19:43:01.4099613Z 96 2025-05-07T19:43:01.4099888Z 2025-05-07T19:43:01.4100071Z + lscpu 2025-05-07T19:43:01.4100204Z 2025-05-07T19:43:01.4358411Z Architecture: x86_64 2025-05-07T19:43:01.4360897Z CPU op-mode(s): 32-bit, 64-bit 2025-05-07T19:43:01.4362139Z Address sizes: 46 bits physical, 48 bits virtual 2025-05-07T19:43:01.4363346Z Byte Order: Little Endian 2025-05-07T19:43:01.4364017Z CPU(s): 96 2025-05-07T19:43:01.4364312Z On-line CPU(s) list: 0-95 2025-05-07T19:43:01.4364648Z Vendor ID: GenuineIntel 2025-05-07T19:43:01.4365034Z Model name: Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:01.4365426Z CPU family: 6 2025-05-07T19:43:01.4365723Z Model: 85 2025-05-07T19:43:01.4366005Z Thread(s) per core: 2 2025-05-07T19:43:01.4366344Z Core(s) per socket: 24 2025-05-07T19:43:01.4366628Z Socket(s): 2 2025-05-07T19:43:01.4366922Z Stepping: 7 2025-05-07T19:43:01.4367243Z BogoMIPS: 5999.99 2025-05-07T19:43:01.4369710Z Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:01.4372147Z Hypervisor vendor: KVM 2025-05-07T19:43:01.4372725Z Virtualization type: full 2025-05-07T19:43:01.4373279Z L1d cache: 1.5 MiB (48 instances) 2025-05-07T19:43:01.4374055Z L1i cache: 1.5 MiB (48 instances) 2025-05-07T19:43:01.4374647Z L2 cache: 48 MiB (48 instances) 2025-05-07T19:43:01.4375027Z L3 cache: 71.5 MiB (2 instances) 2025-05-07T19:43:01.4375514Z NUMA node(s): 2 2025-05-07T19:43:01.4375860Z NUMA node0 CPU(s): 0-23,48-71 2025-05-07T19:43:01.4376501Z NUMA node1 CPU(s): 24-47,72-95 2025-05-07T19:43:01.4377332Z Vulnerability Gather data sampling: Unknown: Dependent on hypervisor status 2025-05-07T19:43:01.4377926Z Vulnerability Itlb multihit: KVM: Mitigation: VMX unsupported 2025-05-07T19:43:01.4378478Z Vulnerability L1tf: Mitigation; PTE Inversion 2025-05-07T19:43:01.4379122Z Vulnerability Mds: Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown 2025-05-07T19:43:01.4379890Z Vulnerability Meltdown: Mitigation; PTI 2025-05-07T19:43:01.4380557Z Vulnerability Mmio stale data: Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown 2025-05-07T19:43:01.4381208Z Vulnerability Reg file data sampling: Not affected 2025-05-07T19:43:01.4381633Z Vulnerability Retbleed: Vulnerable 2025-05-07T19:43:01.4382030Z Vulnerability Spec rstack overflow: Not affected 2025-05-07T19:43:01.4382448Z Vulnerability Spec store bypass: Vulnerable 2025-05-07T19:43:01.4383088Z Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization 2025-05-07T19:43:01.4384240Z Vulnerability Spectre v2: Mitigation; Retpolines; STIBP disabled; RSB filling; PBRSB-eIBRS Not affected; BHI Retpoline 2025-05-07T19:43:01.4385400Z Vulnerability Srbds: Not affected 2025-05-07T19:43:01.4385814Z Vulnerability Tsx async abort: Not affected 2025-05-07T19:43:01.4386105Z 2025-05-07T19:43:01.4386209Z + cat /proc/cpuinfo 2025-05-07T19:43:01.4386356Z 2025-05-07T19:43:01.4386448Z processor : 0 2025-05-07T19:43:01.4386804Z vendor_id : GenuineIntel 2025-05-07T19:43:01.4387070Z cpu family : 6 2025-05-07T19:43:01.4387329Z model : 85 2025-05-07T19:43:01.4387649Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:01.4388031Z stepping : 7 2025-05-07T19:43:01.4388275Z microcode : 0x5003901 2025-05-07T19:43:01.4388519Z cpu MHz : 1202.161 2025-05-07T19:43:01.4388783Z cache size : 36608 KB 2025-05-07T19:43:01.4389029Z physical id : 0 2025-05-07T19:43:01.4389276Z siblings : 48 2025-05-07T19:43:01.4389519Z core id : 0 2025-05-07T19:43:01.4389743Z cpu cores : 24 2025-05-07T19:43:01.4389985Z apicid : 0 2025-05-07T19:43:01.4390200Z initial apicid : 0 2025-05-07T19:43:01.4390455Z fpu : yes 2025-05-07T19:43:01.4390677Z fpu_exception : yes 2025-05-07T19:43:01.4390935Z cpuid level : 13 2025-05-07T19:43:01.4391157Z wp : yes 2025-05-07T19:43:01.4393557Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:01.4396328Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:01.4396952Z bogomips : 5999.99 2025-05-07T19:43:01.4397209Z clflush size : 64 2025-05-07T19:43:01.4397460Z cache_alignment : 64 2025-05-07T19:43:01.4397754Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:01.4398196Z power management: 2025-05-07T19:43:01.4398350Z 2025-05-07T19:43:01.4398441Z processor : 1 2025-05-07T19:43:01.4398701Z vendor_id : GenuineIntel 2025-05-07T19:43:01.4398955Z cpu family : 6 2025-05-07T19:43:01.4399189Z model : 85 2025-05-07T19:43:01.4399485Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:01.4399886Z stepping : 7 2025-05-07T19:43:01.4400106Z microcode : 0x5003901 2025-05-07T19:43:01.4400569Z cpu MHz : 2999.998 2025-05-07T19:43:01.4400829Z cache size : 36608 KB 2025-05-07T19:43:01.4401109Z physical id : 0 2025-05-07T19:43:01.4401407Z siblings : 48 2025-05-07T19:43:01.4401631Z core id : 1 2025-05-07T19:43:01.4401855Z cpu cores : 24 2025-05-07T19:43:01.4402071Z apicid : 2 2025-05-07T19:43:01.4402306Z initial apicid : 2 2025-05-07T19:43:01.4402529Z fpu : yes 2025-05-07T19:43:01.4402753Z fpu_exception : yes 2025-05-07T19:43:01.4402992Z cpuid level : 13 2025-05-07T19:43:01.4403231Z wp : yes 2025-05-07T19:43:01.4405615Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:01.4408390Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:01.4409004Z bogomips : 5999.99 2025-05-07T19:43:01.4409253Z clflush size : 64 2025-05-07T19:43:01.4409483Z cache_alignment : 64 2025-05-07T19:43:01.4409795Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:01.4410157Z power management: 2025-05-07T19:43:01.4410326Z 2025-05-07T19:43:01.4410421Z processor : 2 2025-05-07T19:43:01.4410684Z vendor_id : GenuineIntel 2025-05-07T19:43:01.4411019Z cpu family : 6 2025-05-07T19:43:01.4411235Z model : 85 2025-05-07T19:43:01.4411748Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:01.4412246Z stepping : 7 2025-05-07T19:43:01.4412636Z microcode : 0x5003901 2025-05-07T19:43:01.4413107Z cpu MHz : 2999.998 2025-05-07T19:43:01.4413344Z cache size : 36608 KB 2025-05-07T19:43:01.4413598Z physical id : 0 2025-05-07T19:43:01.4413816Z siblings : 48 2025-05-07T19:43:01.4414042Z core id : 2 2025-05-07T19:43:01.4414330Z cpu cores : 24 2025-05-07T19:43:01.4414557Z apicid : 4 2025-05-07T19:43:01.4414765Z initial apicid : 4 2025-05-07T19:43:01.4415006Z fpu : yes 2025-05-07T19:43:01.4415214Z fpu_exception : yes 2025-05-07T19:43:01.4415461Z cpuid level : 13 2025-05-07T19:43:01.4415696Z wp : yes 2025-05-07T19:43:01.4418062Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:01.4420933Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:01.4421565Z bogomips : 5999.99 2025-05-07T19:43:01.4421796Z clflush size : 64 2025-05-07T19:43:01.4422051Z cache_alignment : 64 2025-05-07T19:43:01.4422344Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:01.4422709Z power management: 2025-05-07T19:43:01.4422850Z 2025-05-07T19:43:01.4422939Z processor : 3 2025-05-07T19:43:01.4423331Z vendor_id : GenuineIntel 2025-05-07T19:43:01.4423589Z cpu family : 6 2025-05-07T19:43:01.4423824Z model : 85 2025-05-07T19:43:01.4424131Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:01.4424504Z stepping : 7 2025-05-07T19:43:01.4424749Z microcode : 0x5003901 2025-05-07T19:43:01.4424998Z cpu MHz : 2999.998 2025-05-07T19:43:01.4425252Z cache size : 36608 KB 2025-05-07T19:43:01.4425494Z physical id : 0 2025-05-07T19:43:01.4425740Z siblings : 48 2025-05-07T19:43:01.4425960Z core id : 3 2025-05-07T19:43:01.4426199Z cpu cores : 24 2025-05-07T19:43:01.4426419Z apicid : 6 2025-05-07T19:43:01.4426657Z initial apicid : 6 2025-05-07T19:43:01.4426891Z fpu : yes 2025-05-07T19:43:01.4427134Z fpu_exception : yes 2025-05-07T19:43:01.4427393Z cpuid level : 13 2025-05-07T19:43:01.4427624Z wp : yes 2025-05-07T19:43:01.4429986Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:01.4432756Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:01.4433373Z bogomips : 5999.99 2025-05-07T19:43:01.4433641Z clflush size : 64 2025-05-07T19:43:01.4434001Z cache_alignment : 64 2025-05-07T19:43:01.4434318Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:01.4434663Z power management: 2025-05-07T19:43:01.4434834Z 2025-05-07T19:43:01.4434927Z processor : 4 2025-05-07T19:43:01.4435158Z vendor_id : GenuineIntel 2025-05-07T19:43:01.4435438Z cpu family : 6 2025-05-07T19:43:01.4435682Z model : 85 2025-05-07T19:43:01.4435975Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:01.4436416Z stepping : 7 2025-05-07T19:43:01.4436641Z microcode : 0x5003901 2025-05-07T19:43:01.4436901Z cpu MHz : 2999.998 2025-05-07T19:43:01.4437122Z cache size : 36608 KB 2025-05-07T19:43:01.4437377Z physical id : 0 2025-05-07T19:43:01.4437591Z siblings : 48 2025-05-07T19:43:01.4437825Z core id : 4 2025-05-07T19:43:01.4438030Z cpu cores : 24 2025-05-07T19:43:01.4438258Z apicid : 8 2025-05-07T19:43:01.4438461Z initial apicid : 8 2025-05-07T19:43:01.4438699Z fpu : yes 2025-05-07T19:43:01.4438912Z fpu_exception : yes 2025-05-07T19:43:01.4439152Z cpuid level : 13 2025-05-07T19:43:01.4439387Z wp : yes 2025-05-07T19:43:01.4441676Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:01.4444331Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:01.4444960Z bogomips : 5999.99 2025-05-07T19:43:01.4445193Z clflush size : 64 2025-05-07T19:43:01.4445444Z cache_alignment : 64 2025-05-07T19:43:01.4445734Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:01.4446082Z power management: 2025-05-07T19:43:01.4446220Z 2025-05-07T19:43:01.4446309Z processor : 5 2025-05-07T19:43:01.4446549Z vendor_id : GenuineIntel 2025-05-07T19:43:01.4446794Z cpu family : 6 2025-05-07T19:43:01.4447088Z model : 85 2025-05-07T19:43:01.4447455Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:01.4447823Z stepping : 7 2025-05-07T19:43:01.4448070Z microcode : 0x5003901 2025-05-07T19:43:01.4448319Z cpu MHz : 2999.998 2025-05-07T19:43:01.4448572Z cache size : 36608 KB 2025-05-07T19:43:01.4448811Z physical id : 0 2025-05-07T19:43:01.4449052Z siblings : 48 2025-05-07T19:43:01.4449262Z core id : 5 2025-05-07T19:43:01.4449488Z cpu cores : 24 2025-05-07T19:43:01.4449698Z apicid : 10 2025-05-07T19:43:01.4449927Z initial apicid : 10 2025-05-07T19:43:01.4450154Z fpu : yes 2025-05-07T19:43:01.4450380Z fpu_exception : yes 2025-05-07T19:43:01.4450625Z cpuid level : 13 2025-05-07T19:43:01.4450840Z wp : yes 2025-05-07T19:43:01.4453148Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:01.4455821Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:01.4456417Z bogomips : 5999.99 2025-05-07T19:43:01.4456664Z clflush size : 64 2025-05-07T19:43:01.4456897Z cache_alignment : 64 2025-05-07T19:43:01.4457198Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:01.4457530Z power management: 2025-05-07T19:43:01.4457687Z 2025-05-07T19:43:01.4457777Z processor : 6 2025-05-07T19:43:01.4458003Z vendor_id : GenuineIntel 2025-05-07T19:43:01.4458268Z cpu family : 6 2025-05-07T19:43:01.4458501Z model : 85 2025-05-07T19:43:01.4458786Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:01.4459160Z stepping : 7 2025-05-07T19:43:01.4459373Z microcode : 0x5003901 2025-05-07T19:43:01.4459729Z cpu MHz : 2999.998 2025-05-07T19:43:01.4460204Z cache size : 36608 KB 2025-05-07T19:43:01.4460460Z physical id : 0 2025-05-07T19:43:01.4460675Z siblings : 48 2025-05-07T19:43:01.4460910Z core id : 6 2025-05-07T19:43:01.4461121Z cpu cores : 24 2025-05-07T19:43:01.4461356Z apicid : 12 2025-05-07T19:43:01.4461571Z initial apicid : 12 2025-05-07T19:43:01.4461816Z fpu : yes 2025-05-07T19:43:01.4462046Z fpu_exception : yes 2025-05-07T19:43:01.4462279Z cpuid level : 13 2025-05-07T19:43:01.4462528Z wp : yes 2025-05-07T19:43:01.4464888Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:01.4467641Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:01.4468226Z bogomips : 5999.99 2025-05-07T19:43:01.4468445Z clflush size : 64 2025-05-07T19:43:01.4468680Z cache_alignment : 64 2025-05-07T19:43:01.4468952Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:01.4469285Z power management: 2025-05-07T19:43:01.4469416Z 2025-05-07T19:43:01.4469502Z processor : 7 2025-05-07T19:43:01.4469736Z vendor_id : GenuineIntel 2025-05-07T19:43:01.4470005Z cpu family : 6 2025-05-07T19:43:01.4470206Z model : 85 2025-05-07T19:43:01.4470496Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:01.4470905Z stepping : 7 2025-05-07T19:43:01.4471136Z microcode : 0x5003901 2025-05-07T19:43:01.4471362Z cpu MHz : 1200.558 2025-05-07T19:43:01.4471597Z cache size : 36608 KB 2025-05-07T19:43:01.4471828Z physical id : 0 2025-05-07T19:43:01.4472063Z siblings : 48 2025-05-07T19:43:01.4472264Z core id : 7 2025-05-07T19:43:01.4472489Z cpu cores : 24 2025-05-07T19:43:01.4472696Z apicid : 14 2025-05-07T19:43:01.4472931Z initial apicid : 14 2025-05-07T19:43:01.4473153Z fpu : yes 2025-05-07T19:43:01.4473384Z fpu_exception : yes 2025-05-07T19:43:01.4473624Z cpuid level : 13 2025-05-07T19:43:01.4473840Z wp : yes 2025-05-07T19:43:01.4476042Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:01.4478588Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:01.4479158Z bogomips : 5999.99 2025-05-07T19:43:01.4479405Z clflush size : 64 2025-05-07T19:43:01.4479630Z cache_alignment : 64 2025-05-07T19:43:01.4479925Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:01.4480243Z power management: 2025-05-07T19:43:01.4480405Z 2025-05-07T19:43:01.4480494Z processor : 8 2025-05-07T19:43:01.4480709Z vendor_id : GenuineIntel 2025-05-07T19:43:01.4480962Z cpu family : 6 2025-05-07T19:43:01.4481184Z model : 85 2025-05-07T19:43:01.4481448Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:01.4481801Z stepping : 7 2025-05-07T19:43:01.4482006Z microcode : 0x5003901 2025-05-07T19:43:01.4482244Z cpu MHz : 1199.976 2025-05-07T19:43:01.4482455Z cache size : 36608 KB 2025-05-07T19:43:01.4482690Z physical id : 0 2025-05-07T19:43:01.4482949Z siblings : 48 2025-05-07T19:43:01.4483161Z core id : 8 2025-05-07T19:43:01.4483358Z cpu cores : 24 2025-05-07T19:43:01.4483575Z apicid : 16 2025-05-07T19:43:01.4483773Z initial apicid : 16 2025-05-07T19:43:01.4484011Z fpu : yes 2025-05-07T19:43:01.4484226Z fpu_exception : yes 2025-05-07T19:43:01.4484440Z cpuid level : 13 2025-05-07T19:43:01.4484670Z wp : yes 2025-05-07T19:43:01.4486833Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:01.4489348Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:01.4489929Z bogomips : 5999.99 2025-05-07T19:43:01.4490142Z clflush size : 64 2025-05-07T19:43:01.4490399Z cache_alignment : 64 2025-05-07T19:43:01.4490668Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:01.4491191Z power management: 2025-05-07T19:43:01.4491330Z 2025-05-07T19:43:01.4491419Z processor : 9 2025-05-07T19:43:01.4491659Z vendor_id : GenuineIntel 2025-05-07T19:43:01.4491925Z cpu family : 6 2025-05-07T19:43:01.4492134Z model : 85 2025-05-07T19:43:01.4492434Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:01.4492820Z stepping : 7 2025-05-07T19:43:01.4493170Z microcode : 0x5003901 2025-05-07T19:43:01.4493755Z cpu MHz : 2999.998 2025-05-07T19:43:01.4494040Z cache size : 36608 KB 2025-05-07T19:43:01.4494273Z physical id : 0 2025-05-07T19:43:01.4494512Z siblings : 48 2025-05-07T19:43:01.4494722Z core id : 9 2025-05-07T19:43:01.4494949Z cpu cores : 24 2025-05-07T19:43:01.4495160Z apicid : 18 2025-05-07T19:43:01.4495390Z initial apicid : 18 2025-05-07T19:43:01.4495616Z fpu : yes 2025-05-07T19:43:01.4495838Z fpu_exception : yes 2025-05-07T19:43:01.4496076Z cpuid level : 13 2025-05-07T19:43:01.4496290Z wp : yes 2025-05-07T19:43:01.4498747Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:01.4502462Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:01.4503330Z bogomips : 5999.99 2025-05-07T19:43:01.4503800Z clflush size : 64 2025-05-07T19:43:01.4504112Z cache_alignment : 64 2025-05-07T19:43:01.4504415Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:01.4504753Z power management: 2025-05-07T19:43:01.4504909Z 2025-05-07T19:43:01.4504999Z processor : 10 2025-05-07T19:43:01.4505225Z vendor_id : GenuineIntel 2025-05-07T19:43:01.4505488Z cpu family : 6 2025-05-07T19:43:01.4505717Z model : 85 2025-05-07T19:43:01.4506001Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:01.4506379Z stepping : 7 2025-05-07T19:43:01.4506593Z microcode : 0x5003901 2025-05-07T19:43:01.4506843Z cpu MHz : 2999.998 2025-05-07T19:43:01.4507076Z cache size : 36608 KB 2025-05-07T19:43:01.4507325Z physical id : 0 2025-05-07T19:43:01.4507543Z siblings : 48 2025-05-07T19:43:01.4507768Z core id : 10 2025-05-07T19:43:01.4508116Z cpu cores : 24 2025-05-07T19:43:01.4508352Z apicid : 20 2025-05-07T19:43:01.4508565Z initial apicid : 20 2025-05-07T19:43:01.4508807Z fpu : yes 2025-05-07T19:43:01.4509031Z fpu_exception : yes 2025-05-07T19:43:01.4509257Z cpuid level : 13 2025-05-07T19:43:01.4509492Z wp : yes 2025-05-07T19:43:01.4511824Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:01.4515096Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:01.4515822Z bogomips : 5999.99 2025-05-07T19:43:01.4516214Z clflush size : 64 2025-05-07T19:43:01.4516661Z cache_alignment : 64 2025-05-07T19:43:01.4516967Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:01.4517317Z power management: 2025-05-07T19:43:01.4517455Z 2025-05-07T19:43:01.4517545Z processor : 11 2025-05-07T19:43:01.4517789Z vendor_id : GenuineIntel 2025-05-07T19:43:01.4518055Z cpu family : 6 2025-05-07T19:43:01.4518264Z model : 85 2025-05-07T19:43:01.4518573Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:01.4518943Z stepping : 7 2025-05-07T19:43:01.4519199Z microcode : 0x5003901 2025-05-07T19:43:01.4519450Z cpu MHz : 1296.100 2025-05-07T19:43:01.4519712Z cache size : 36608 KB 2025-05-07T19:43:01.4520091Z physical id : 0 2025-05-07T19:43:01.4520358Z siblings : 48 2025-05-07T19:43:01.4520581Z core id : 11 2025-05-07T19:43:01.4520837Z cpu cores : 24 2025-05-07T19:43:01.4521066Z apicid : 22 2025-05-07T19:43:01.4521335Z initial apicid : 22 2025-05-07T19:43:01.4521572Z fpu : yes 2025-05-07T19:43:01.4521824Z fpu_exception : yes 2025-05-07T19:43:01.4522074Z cpuid level : 13 2025-05-07T19:43:01.4522291Z wp : yes 2025-05-07T19:43:01.4524597Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:01.4527285Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:01.4527888Z bogomips : 5999.99 2025-05-07T19:43:01.4528131Z clflush size : 64 2025-05-07T19:43:01.4529307Z lspci: Unable to load libkmod resources: error -2 2025-05-07T19:43:01.4530100Z cache_alignment : 64 2025-05-07T19:43:01.4530537Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:01.4530885Z power management: 2025-05-07T19:43:01.4531025Z 2025-05-07T19:43:01.4531139Z processor : 12 2025-05-07T19:43:01.4531372Z vendor_id : GenuineIntel 2025-05-07T19:43:01.4531666Z cpu family : 6 2025-05-07T19:43:01.4531895Z model : 85 2025-05-07T19:43:01.4532223Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:01.4532603Z stepping : 7 2025-05-07T19:43:01.4532860Z microcode : 0x5003901 2025-05-07T19:43:01.4533114Z cpu MHz : 1207.400 2025-05-07T19:43:01.4533381Z cache size : 36608 KB 2025-05-07T19:43:01.4533637Z physical id : 0 2025-05-07T19:43:01.4533897Z siblings : 48 2025-05-07T19:43:01.4534125Z core id : 12 2025-05-07T19:43:01.4534376Z cpu cores : 24 2025-05-07T19:43:01.4534698Z apicid : 24 2025-05-07T19:43:01.4534934Z initial apicid : 24 2025-05-07T19:43:01.4535203Z fpu : yes 2025-05-07T19:43:01.4535431Z fpu_exception : yes 2025-05-07T19:43:01.4535692Z cpuid level : 13 2025-05-07T19:43:01.4535912Z wp : yes 2025-05-07T19:43:01.4538284Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:01.4541438Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:01.4542194Z bogomips : 5999.99 2025-05-07T19:43:01.4542565Z clflush size : 64 2025-05-07T19:43:01.4542993Z cache_alignment : 64 2025-05-07T19:43:01.4543467Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:01.4543807Z power management: 2025-05-07T19:43:01.4543961Z 2025-05-07T19:43:01.4544051Z processor : 13 2025-05-07T19:43:01.4544294Z vendor_id : GenuineIntel 2025-05-07T19:43:01.4544542Z cpu family : 6 2025-05-07T19:43:01.4544770Z model : 85 2025-05-07T19:43:01.4545055Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:01.4545436Z stepping : 7 2025-05-07T19:43:01.4545654Z microcode : 0x5003901 2025-05-07T19:43:01.4545903Z cpu MHz : 2999.998 2025-05-07T19:43:01.4546132Z cache size : 36608 KB 2025-05-07T19:43:01.4546383Z physical id : 0 2025-05-07T19:43:01.4546686Z siblings : 48 2025-05-07T19:43:01.4546919Z core id : 13 2025-05-07T19:43:01.4547146Z cpu cores : 24 2025-05-07T19:43:01.4547361Z apicid : 26 2025-05-07T19:43:01.4547599Z initial apicid : 26 2025-05-07T19:43:01.4547823Z fpu : yes 2025-05-07T19:43:01.4548050Z fpu_exception : yes 2025-05-07T19:43:01.4548275Z cpuid level : 13 2025-05-07T19:43:01.4548508Z wp : yes 2025-05-07T19:43:01.4550865Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:01.4554091Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:01.4554768Z bogomips : 5999.99 2025-05-07T19:43:01.4555159Z clflush size : 64 2025-05-07T19:43:01.4555598Z cache_alignment : 64 2025-05-07T19:43:01.4555886Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:01.4556230Z power management: 2025-05-07T19:43:01.4556365Z 2025-05-07T19:43:01.4556470Z processor : 14 2025-05-07T19:43:01.4556690Z vendor_id : GenuineIntel 2025-05-07T19:43:01.4556948Z cpu family : 6 2025-05-07T19:43:01.4557155Z model : 85 2025-05-07T19:43:01.4557625Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:01.4557998Z stepping : 7 2025-05-07T19:43:01.4558260Z microcode : 0x5003901 2025-05-07T19:43:01.4558503Z cpu MHz : 2999.998 2025-05-07T19:43:01.4558742Z cache size : 36608 KB 2025-05-07T19:43:01.4558975Z physical id : 0 2025-05-07T19:43:01.4559237Z siblings : 48 2025-05-07T19:43:01.4559453Z core id : 14 2025-05-07T19:43:01.4559678Z cpu cores : 24 2025-05-07T19:43:01.4559907Z apicid : 28 2025-05-07T19:43:01.4560121Z initial apicid : 28 2025-05-07T19:43:01.4560455Z fpu : yes 2025-05-07T19:43:01.4560669Z fpu_exception : yes 2025-05-07T19:43:01.4560939Z cpuid level : 13 2025-05-07T19:43:01.4561164Z wp : yes 2025-05-07T19:43:01.4563506Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:01.4566237Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:01.4566920Z bogomips : 5999.99 2025-05-07T19:43:01.4567179Z clflush size : 64 2025-05-07T19:43:01.4567411Z cache_alignment : 64 2025-05-07T19:43:01.4567726Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:01.4568085Z power management: 2025-05-07T19:43:01.4568230Z 2025-05-07T19:43:01.4568322Z processor : 15 2025-05-07T19:43:01.4568577Z vendor_id : GenuineIntel 2025-05-07T19:43:01.4568832Z cpu family : 6 2025-05-07T19:43:01.4569071Z model : 85 2025-05-07T19:43:01.4569366Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:01.4569756Z stepping : 7 2025-05-07T19:43:01.4569975Z microcode : 0x5003901 2025-05-07T19:43:01.4570242Z cpu MHz : 1204.484 2025-05-07T19:43:01.4570474Z cache size : 36608 KB 2025-05-07T19:43:01.4570736Z physical id : 0 2025-05-07T19:43:01.4570965Z siblings : 48 2025-05-07T19:43:01.4571201Z core id : 15 2025-05-07T19:43:01.4571498Z cpu cores : 24 2025-05-07T19:43:01.4571714Z apicid : 30 2025-05-07T19:43:01.4571946Z initial apicid : 30 2025-05-07T19:43:01.4572166Z fpu : yes 2025-05-07T19:43:01.4572397Z fpu_exception : yes 2025-05-07T19:43:01.4572621Z cpuid level : 13 2025-05-07T19:43:01.4572854Z wp : yes 2025-05-07T19:43:01.4575182Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:01.4577914Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:01.4578533Z bogomips : 5999.99 2025-05-07T19:43:01.4578757Z clflush size : 64 2025-05-07T19:43:01.4579004Z cache_alignment : 64 2025-05-07T19:43:01.4579290Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:01.4579781Z power management: 2025-05-07T19:43:01.4579932Z 2025-05-07T19:43:01.4580056Z processor : 16 2025-05-07T19:43:01.4580306Z vendor_id : GenuineIntel 2025-05-07T19:43:01.4580608Z cpu family : 6 2025-05-07T19:43:01.4580839Z model : 85 2025-05-07T19:43:01.4581177Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:01.4581552Z stepping : 7 2025-05-07T19:43:01.4581810Z microcode : 0x5003901 2025-05-07T19:43:01.4582064Z cpu MHz : 2999.998 2025-05-07T19:43:01.4582309Z cache size : 36608 KB 2025-05-07T19:43:01.4582544Z physical id : 0 2025-05-07T19:43:01.4582783Z siblings : 48 2025-05-07T19:43:01.4583009Z core id : 16 2025-05-07T19:43:01.4583222Z cpu cores : 24 2025-05-07T19:43:01.4583455Z apicid : 32 2025-05-07T19:43:01.4583668Z initial apicid : 32 2025-05-07T19:43:01.4583909Z fpu : yes 2025-05-07T19:43:01.4584127Z fpu_exception : yes 2025-05-07T19:43:01.4584464Z cpuid level : 13 2025-05-07T19:43:01.4584700Z wp : yes 2025-05-07T19:43:01.4587078Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:01.4589828Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:01.4590460Z bogomips : 5999.99 2025-05-07T19:43:01.4590735Z clflush size : 64 2025-05-07T19:43:01.4590983Z cache_alignment : 64 2025-05-07T19:43:01.4591313Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:01.4591694Z power management: 2025-05-07T19:43:01.4591843Z 2025-05-07T19:43:01.4591942Z processor : 17 2025-05-07T19:43:01.4592212Z vendor_id : GenuineIntel 2025-05-07T19:43:01.4592479Z cpu family : 6 2025-05-07T19:43:01.4592732Z model : 85 2025-05-07T19:43:01.4593036Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:01.4593737Z stepping : 7 2025-05-07T19:43:01.4594164Z microcode : 0x5003901 2025-05-07T19:43:01.4594535Z cpu MHz : 2999.998 2025-05-07T19:43:01.4594889Z cache size : 36608 KB 2025-05-07T19:43:01.4595367Z physical id : 0 2025-05-07T19:43:01.4595735Z siblings : 48 2025-05-07T19:43:01.4595965Z core id : 17 2025-05-07T19:43:01.4596212Z cpu cores : 24 2025-05-07T19:43:01.4596435Z apicid : 34 2025-05-07T19:43:01.4596770Z initial apicid : 34 2025-05-07T19:43:01.4597010Z fpu : yes 2025-05-07T19:43:01.4597262Z fpu_exception : yes 2025-05-07T19:43:01.4597501Z cpuid level : 13 2025-05-07T19:43:01.4597761Z wp : yes 2025-05-07T19:43:01.4600520Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:01.4603262Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:01.4603907Z bogomips : 5999.99 2025-05-07T19:43:01.4604181Z clflush size : 64 2025-05-07T19:43:01.4604429Z cache_alignment : 64 2025-05-07T19:43:01.4604762Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:01.4605122Z power management: 2025-05-07T19:43:01.4605276Z 2025-05-07T19:43:01.4605397Z processor : 18 2025-05-07T19:43:01.4605638Z vendor_id : GenuineIntel 2025-05-07T19:43:01.4605917Z cpu family : 6 2025-05-07T19:43:01.4606124Z model : 85 2025-05-07T19:43:01.4606423Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:01.4606783Z stepping : 7 2025-05-07T19:43:01.4607015Z microcode : 0x5003901 2025-05-07T19:43:01.4607270Z cpu MHz : 2999.998 2025-05-07T19:43:01.4607493Z cache size : 36608 KB 2025-05-07T19:43:01.4607743Z physical id : 0 2025-05-07T19:43:01.4607961Z siblings : 48 2025-05-07T19:43:01.4608190Z core id : 18 2025-05-07T19:43:01.4608404Z cpu cores : 24 2025-05-07T19:43:01.4608632Z apicid : 36 2025-05-07T19:43:01.4608846Z initial apicid : 36 2025-05-07T19:43:01.4609090Z fpu : yes 2025-05-07T19:43:01.4609299Z fpu_exception : yes 2025-05-07T19:43:01.4609552Z cpuid level : 13 2025-05-07T19:43:01.4609769Z wp : yes 2025-05-07T19:43:01.4612148Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:01.4620532Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:01.4621151Z bogomips : 5999.99 2025-05-07T19:43:01.4621409Z clflush size : 64 2025-05-07T19:43:01.4643263Z cache_alignment : 64 2025-05-07T19:43:01.4643725Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:01.4644048Z power management: 2025-05-07T19:43:01.4644200Z 2025-05-07T19:43:01.4644300Z processor : 19 2025-05-07T19:43:01.4644505Z vendor_id : GenuineIntel 2025-05-07T19:43:01.4644758Z cpu family : 6 2025-05-07T19:43:01.4644952Z model : 85 2025-05-07T19:43:01.4645236Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:01.4645575Z stepping : 7 2025-05-07T19:43:01.4645795Z microcode : 0x5003901 2025-05-07T19:43:01.4646015Z cpu MHz : 2999.998 2025-05-07T19:43:01.4646237Z cache size : 36608 KB 2025-05-07T19:43:01.4646472Z physical id : 0 2025-05-07T19:43:01.4646672Z siblings : 48 2025-05-07T19:43:01.4646880Z core id : 19 2025-05-07T19:43:01.4647067Z cpu cores : 24 2025-05-07T19:43:01.4647278Z apicid : 38 2025-05-07T19:43:01.4647475Z initial apicid : 38 2025-05-07T19:43:01.4647701Z fpu : yes 2025-05-07T19:43:01.4647889Z fpu_exception : yes 2025-05-07T19:43:01.4648296Z cpuid level : 13 2025-05-07T19:43:01.4648506Z wp : yes 2025-05-07T19:43:01.4650692Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:01.4653233Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:01.4653794Z bogomips : 5999.99 2025-05-07T19:43:01.4654018Z clflush size : 64 2025-05-07T19:43:01.4654248Z cache_alignment : 64 2025-05-07T19:43:01.4654519Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:01.4654851Z power management: 2025-05-07T19:43:01.4654983Z 2025-05-07T19:43:01.4655062Z processor : 20 2025-05-07T19:43:01.4655287Z vendor_id : GenuineIntel 2025-05-07T19:43:01.4655518Z cpu family : 6 2025-05-07T19:43:01.4655730Z model : 85 2025-05-07T19:43:01.4655995Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:01.4656345Z stepping : 7 2025-05-07T19:43:01.4656542Z microcode : 0x5003901 2025-05-07T19:43:01.4656775Z cpu MHz : 2999.998 2025-05-07T19:43:01.4656993Z cache size : 36608 KB 2025-05-07T19:43:01.4657207Z physical id : 0 2025-05-07T19:43:01.4657422Z siblings : 48 2025-05-07T19:43:01.4657614Z core id : 20 2025-05-07T19:43:01.4657813Z cpu cores : 24 2025-05-07T19:43:01.4658003Z apicid : 40 2025-05-07T19:43:01.4658201Z initial apicid : 40 2025-05-07T19:43:01.4658402Z fpu : yes 2025-05-07T19:43:01.4658604Z fpu_exception : yes 2025-05-07T19:43:01.4658816Z cpuid level : 13 2025-05-07T19:43:01.4659030Z wp : yes 2025-05-07T19:43:01.4661594Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:01.4664399Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:01.4664995Z bogomips : 5999.99 2025-05-07T19:43:01.4665211Z clflush size : 64 2025-05-07T19:43:01.4665425Z cache_alignment : 64 2025-05-07T19:43:01.4665708Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:01.4666031Z power management: 2025-05-07T19:43:01.4666162Z 2025-05-07T19:43:01.4666259Z processor : 21 2025-05-07T19:43:01.4666479Z vendor_id : GenuineIntel 2025-05-07T19:43:01.4666760Z cpu family : 6 2025-05-07T19:43:01.4666977Z model : 85 2025-05-07T19:43:01.4667284Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:01.4667643Z stepping : 7 2025-05-07T19:43:01.4667884Z microcode : 0x5003901 2025-05-07T19:43:01.4668130Z cpu MHz : 1196.670 2025-05-07T19:43:01.4668371Z cache size : 36608 KB 2025-05-07T19:43:01.4668629Z physical id : 0 2025-05-07T19:43:01.4668858Z siblings : 48 2025-05-07T19:43:01.4669105Z core id : 21 2025-05-07T19:43:01.4669330Z cpu cores : 24 2025-05-07T19:43:01.4669590Z apicid : 42 2025-05-07T19:43:01.4669810Z initial apicid : 42 2025-05-07T19:43:01.4670061Z fpu : yes 2025-05-07T19:43:01.4670286Z fpu_exception : yes 2025-05-07T19:43:01.4670555Z cpuid level : 13 2025-05-07T19:43:01.4670784Z wp : yes 2025-05-07T19:43:01.4673249Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:01.4675782Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:01.4676358Z bogomips : 5999.99 2025-05-07T19:43:01.4676608Z clflush size : 64 2025-05-07T19:43:01.4676853Z cache_alignment : 64 2025-05-07T19:43:01.4677134Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:01.4677484Z power management: 2025-05-07T19:43:01.4677625Z 2025-05-07T19:43:01.4677720Z processor : 22 2025-05-07T19:43:01.4677976Z vendor_id : GenuineIntel 2025-05-07T19:43:01.4678227Z cpu family : 6 2025-05-07T19:43:01.4678473Z model : 85 2025-05-07T19:43:01.4678754Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:01.4679116Z stepping : 7 2025-05-07T19:43:01.4679324Z microcode : 0x5003901 2025-05-07T19:43:01.4679569Z cpu MHz : 2999.998 2025-05-07T19:43:01.4679808Z cache size : 36608 KB 2025-05-07T19:43:01.4680031Z physical id : 0 2025-05-07T19:43:01.4680259Z siblings : 48 2025-05-07T19:43:01.4680456Z core id : 22 2025-05-07T19:43:01.4680680Z cpu cores : 24 2025-05-07T19:43:01.4680882Z apicid : 44 2025-05-07T19:43:01.4681110Z initial apicid : 44 2025-05-07T19:43:01.4681322Z fpu : yes 2025-05-07T19:43:01.4681545Z fpu_exception : yes 2025-05-07T19:43:01.4681763Z cpuid level : 13 2025-05-07T19:43:01.4681994Z wp : yes 2025-05-07T19:43:01.4684196Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:01.4686769Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:01.4687368Z bogomips : 5999.99 2025-05-07T19:43:01.4687616Z clflush size : 64 2025-05-07T19:43:01.4687835Z cache_alignment : 64 2025-05-07T19:43:01.4688134Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:01.4688450Z power management: 2025-05-07T19:43:01.4688584Z 2025-05-07T19:43:01.4688703Z processor : 23 2025-05-07T19:43:01.4688927Z vendor_id : GenuineIntel 2025-05-07T19:43:01.4689189Z cpu family : 6 2025-05-07T19:43:01.4689396Z model : 85 2025-05-07T19:43:01.4689700Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:01.4690047Z stepping : 7 2025-05-07T19:43:01.4690283Z microcode : 0x5003901 2025-05-07T19:43:01.4690513Z cpu MHz : 2999.998 2025-05-07T19:43:01.4690752Z cache size : 36608 KB 2025-05-07T19:43:01.4691000Z physical id : 0 2025-05-07T19:43:01.4691210Z siblings : 48 2025-05-07T19:43:01.4691436Z core id : 23 2025-05-07T19:43:01.4691640Z cpu cores : 24 2025-05-07T19:43:01.4691866Z apicid : 46 2025-05-07T19:43:01.4692072Z initial apicid : 46 2025-05-07T19:43:01.4692308Z fpu : yes 2025-05-07T19:43:01.4692508Z fpu_exception : yes 2025-05-07T19:43:01.4692745Z cpuid level : 13 2025-05-07T19:43:01.4692958Z wp : yes 2025-05-07T19:43:01.4695211Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:01.4697747Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:01.4698328Z bogomips : 5999.99 2025-05-07T19:43:01.4698589Z clflush size : 64 2025-05-07T19:43:01.4699000Z cache_alignment : 64 2025-05-07T19:43:01.4699742Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:01.4700489Z power management: 2025-05-07T19:43:01.4700639Z 2025-05-07T19:43:01.4700732Z processor : 24 2025-05-07T19:43:01.4701197Z vendor_id : GenuineIntel 2025-05-07T19:43:01.4701564Z cpu family : 6 2025-05-07T19:43:01.4701864Z model : 85 2025-05-07T19:43:01.4702343Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:01.4702913Z stepping : 7 2025-05-07T19:43:01.4703349Z microcode : 0x5003901 2025-05-07T19:43:01.4703614Z cpu MHz : 3207.775 2025-05-07T19:43:01.4703823Z cache size : 36608 KB 2025-05-07T19:43:01.4704050Z physical id : 1 2025-05-07T19:43:01.4704272Z siblings : 48 2025-05-07T19:43:01.4704470Z core id : 0 2025-05-07T19:43:01.4704677Z cpu cores : 24 2025-05-07T19:43:01.4704878Z apicid : 64 2025-05-07T19:43:01.4705094Z initial apicid : 64 2025-05-07T19:43:01.4705309Z fpu : yes 2025-05-07T19:43:01.4705515Z fpu_exception : yes 2025-05-07T19:43:01.4705728Z cpuid level : 13 2025-05-07T19:43:01.4706097Z wp : yes 2025-05-07T19:43:01.4708678Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:01.4712023Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:01.4712931Z bogomips : 5999.99 2025-05-07T19:43:01.4713149Z clflush size : 64 2025-05-07T19:43:01.4713377Z cache_alignment : 64 2025-05-07T19:43:01.4713627Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:01.4714013Z power management: 2025-05-07T19:43:01.4714140Z 2025-05-07T19:43:01.4714217Z processor : 25 2025-05-07T19:43:01.4714426Z vendor_id : GenuineIntel 2025-05-07T19:43:01.4714833Z cpu family : 6 2025-05-07T19:43:01.4715038Z model : 85 2025-05-07T19:43:01.4715301Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:01.4715654Z stepping : 7 2025-05-07T19:43:01.4715848Z microcode : 0x5003901 2025-05-07T19:43:01.4716072Z cpu MHz : 3093.434 2025-05-07T19:43:01.4716285Z cache size : 36608 KB 2025-05-07T19:43:01.4716498Z physical id : 1 2025-05-07T19:43:01.4716704Z siblings : 48 2025-05-07T19:43:01.4716895Z core id : 1 2025-05-07T19:43:01.4717093Z cpu cores : 24 2025-05-07T19:43:01.4717288Z apicid : 66 2025-05-07T19:43:01.4717491Z initial apicid : 66 2025-05-07T19:43:01.4717694Z fpu : yes 2025-05-07T19:43:01.4717892Z fpu_exception : yes 2025-05-07T19:43:01.4718096Z cpuid level : 13 2025-05-07T19:43:01.4718298Z wp : yes 2025-05-07T19:43:01.4720665Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:01.4723296Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:01.4723877Z bogomips : 5999.99 2025-05-07T19:43:01.4724088Z clflush size : 64 2025-05-07T19:43:01.4724306Z cache_alignment : 64 2025-05-07T19:43:01.4724585Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:01.4724899Z power management: 2025-05-07T19:43:01.4725028Z 2025-05-07T19:43:01.4725119Z processor : 26 2025-05-07T19:43:01.4725326Z vendor_id : GenuineIntel 2025-05-07T19:43:01.4725562Z cpu family : 6 2025-05-07T19:43:01.4725759Z model : 85 2025-05-07T19:43:01.4726205Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:01.4726557Z stepping : 7 2025-05-07T19:43:01.4726771Z microcode : 0x5003901 2025-05-07T19:43:01.4727003Z cpu MHz : 3069.568 2025-05-07T19:43:01.4727225Z cache size : 36608 KB 2025-05-07T19:43:01.4727466Z physical id : 1 2025-05-07T19:43:01.4727681Z siblings : 48 2025-05-07T19:43:01.4727892Z core id : 2 2025-05-07T19:43:01.4728095Z cpu cores : 24 2025-05-07T19:43:01.4728307Z apicid : 68 2025-05-07T19:43:01.4728506Z initial apicid : 68 2025-05-07T19:43:01.4728723Z fpu : yes 2025-05-07T19:43:01.4728916Z fpu_exception : yes 2025-05-07T19:43:01.4729143Z cpuid level : 13 2025-05-07T19:43:01.4729343Z wp : yes 2025-05-07T19:43:01.4731681Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:01.4734472Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:01.4735071Z bogomips : 5999.99 2025-05-07T19:43:01.4735318Z clflush size : 64 2025-05-07T19:43:01.4735552Z cache_alignment : 64 2025-05-07T19:43:01.4735831Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:01.4736180Z power management: 2025-05-07T19:43:01.4736320Z 2025-05-07T19:43:01.4736402Z processor : 27 2025-05-07T19:43:01.4736633Z vendor_id : GenuineIntel 2025-05-07T19:43:01.4736874Z cpu family : 6 2025-05-07T19:43:01.4737090Z model : 85 2025-05-07T19:43:01.4737381Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:01.4737748Z stepping : 7 2025-05-07T19:43:01.4737965Z microcode : 0x5003901 2025-05-07T19:43:01.4738204Z cpu MHz : 3144.925 2025-05-07T19:43:01.4738439Z cache size : 36608 KB 2025-05-07T19:43:01.4738660Z physical id : 1 2025-05-07T19:43:01.4738888Z siblings : 48 2025-05-07T19:43:01.4739090Z core id : 3 2025-05-07T19:43:01.4739323Z cpu cores : 24 2025-05-07T19:43:01.4739654Z apicid : 70 2025-05-07T19:43:01.4740058Z initial apicid : 70 2025-05-07T19:43:01.4740461Z fpu : yes 2025-05-07T19:43:01.4740712Z fpu_exception : yes 2025-05-07T19:43:01.4740932Z cpuid level : 13 2025-05-07T19:43:01.4741150Z wp : yes 2025-05-07T19:43:01.4743597Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:01.4746303Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:01.4746910Z bogomips : 5999.99 2025-05-07T19:43:01.4747141Z clflush size : 64 2025-05-07T19:43:01.4747355Z cache_alignment : 64 2025-05-07T19:43:01.4747653Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:01.4747981Z power management: 2025-05-07T19:43:01.4748113Z 2025-05-07T19:43:01.4748212Z processor : 28 2025-05-07T19:43:01.4748424Z vendor_id : GenuineIntel 2025-05-07T19:43:01.4748673Z cpu family : 6 2025-05-07T19:43:01.4748867Z model : 85 2025-05-07T19:43:01.4749158Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:01.4749511Z stepping : 7 2025-05-07T19:43:01.4749736Z microcode : 0x5003901 2025-05-07T19:43:01.4749967Z cpu MHz : 3156.044 2025-05-07T19:43:01.4750212Z cache size : 36608 KB 2025-05-07T19:43:01.4750451Z physical id : 1 2025-05-07T19:43:01.4750658Z siblings : 48 2025-05-07T19:43:01.4750860Z core id : 4 2025-05-07T19:43:01.4751055Z cpu cores : 24 2025-05-07T19:43:01.4751259Z apicid : 72 2025-05-07T19:43:01.4751458Z initial apicid : 72 2025-05-07T19:43:01.4751675Z fpu : yes 2025-05-07T19:43:01.4751869Z fpu_exception : yes 2025-05-07T19:43:01.4752199Z cpuid level : 13 2025-05-07T19:43:01.4752389Z wp : yes 2025-05-07T19:43:01.4754539Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:01.4757111Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:01.4757667Z bogomips : 5999.99 2025-05-07T19:43:01.4757874Z clflush size : 64 2025-05-07T19:43:01.4758085Z cache_alignment : 64 2025-05-07T19:43:01.4758334Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:01.4758651Z power management: 2025-05-07T19:43:01.4758776Z 2025-05-07T19:43:01.4758851Z processor : 29 2025-05-07T19:43:01.4759045Z vendor_id : GenuineIntel 2025-05-07T19:43:01.4759259Z cpu family : 6 2025-05-07T19:43:01.4759447Z model : 85 2025-05-07T19:43:01.4759709Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:01.4760066Z stepping : 7 2025-05-07T19:43:01.4760274Z microcode : 0x5003901 2025-05-07T19:43:01.4760512Z cpu MHz : 2999.998 2025-05-07T19:43:01.4760739Z cache size : 36608 KB 2025-05-07T19:43:01.4760954Z physical id : 1 2025-05-07T19:43:01.4761177Z siblings : 48 2025-05-07T19:43:01.4761372Z core id : 5 2025-05-07T19:43:01.4761586Z cpu cores : 24 2025-05-07T19:43:01.4761779Z apicid : 74 2025-05-07T19:43:01.4761988Z initial apicid : 74 2025-05-07T19:43:01.4762190Z fpu : yes 2025-05-07T19:43:01.4762398Z fpu_exception : yes 2025-05-07T19:43:01.4762606Z cpuid level : 13 2025-05-07T19:43:01.4762821Z wp : yes 2025-05-07T19:43:01.4765074Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:01.4767575Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:01.4768142Z bogomips : 5999.99 2025-05-07T19:43:01.4768381Z clflush size : 64 2025-05-07T19:43:01.4768592Z cache_alignment : 64 2025-05-07T19:43:01.4768868Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:01.4769177Z power management: 2025-05-07T19:43:01.4769306Z 2025-05-07T19:43:01.4769405Z processor : 30 2025-05-07T19:43:01.4769614Z vendor_id : GenuineIntel 2025-05-07T19:43:01.4769861Z cpu family : 6 2025-05-07T19:43:01.4770059Z model : 85 2025-05-07T19:43:01.4770339Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:01.4770676Z stepping : 7 2025-05-07T19:43:01.4770895Z microcode : 0x5003901 2025-05-07T19:43:01.4771115Z cpu MHz : 3137.235 2025-05-07T19:43:01.4771336Z cache size : 36608 KB 2025-05-07T19:43:01.4771572Z physical id : 1 2025-05-07T19:43:01.4771773Z siblings : 48 2025-05-07T19:43:01.4771988Z core id : 6 2025-05-07T19:43:01.4772176Z cpu cores : 24 2025-05-07T19:43:01.4772385Z apicid : 76 2025-05-07T19:43:01.4772581Z initial apicid : 76 2025-05-07T19:43:01.4772802Z fpu : yes 2025-05-07T19:43:01.4772993Z fpu_exception : yes 2025-05-07T19:43:01.4773218Z cpuid level : 13 2025-05-07T19:43:01.4773427Z wp : yes 2025-05-07T19:43:01.4775617Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:01.4778129Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:01.4778742Z bogomips : 5999.99 2025-05-07T19:43:01.4778966Z clflush size : 64 2025-05-07T19:43:01.4779194Z cache_alignment : 64 2025-05-07T19:43:01.4779544Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:01.4780054Z power management: 2025-05-07T19:43:01.4780196Z 2025-05-07T19:43:01.4780285Z processor : 31 2025-05-07T19:43:01.4780532Z vendor_id : GenuineIntel 2025-05-07T19:43:01.4780872Z cpu family : 6 2025-05-07T19:43:01.4781101Z model : 85 2025-05-07T19:43:01.4781391Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:01.4781773Z stepping : 7 2025-05-07T19:43:01.4781986Z microcode : 0x5003901 2025-05-07T19:43:01.4782234Z cpu MHz : 3078.024 2025-05-07T19:43:01.4782477Z cache size : 36608 KB 2025-05-07T19:43:01.4782708Z physical id : 1 2025-05-07T19:43:01.4782941Z siblings : 48 2025-05-07T19:43:01.4783144Z core id : 7 2025-05-07T19:43:01.4783362Z cpu cores : 24 2025-05-07T19:43:01.4783571Z apicid : 78 2025-05-07T19:43:01.4783796Z initial apicid : 78 2025-05-07T19:43:01.4784016Z fpu : yes 2025-05-07T19:43:01.4784234Z fpu_exception : yes 2025-05-07T19:43:01.4784455Z cpuid level : 13 2025-05-07T19:43:01.4784680Z wp : yes 2025-05-07T19:43:01.4787068Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:01.4789767Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:01.4790384Z bogomips : 5999.99 2025-05-07T19:43:01.4790627Z clflush size : 64 2025-05-07T19:43:01.4790852Z cache_alignment : 64 2025-05-07T19:43:01.4791161Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:01.4791496Z power management: 2025-05-07T19:43:01.4791635Z 2025-05-07T19:43:01.4791739Z processor : 32 2025-05-07T19:43:01.4791965Z vendor_id : GenuineIntel 2025-05-07T19:43:01.4792343Z cpu family : 6 2025-05-07T19:43:01.4792547Z model : 85 2025-05-07T19:43:01.4792820Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:01.4793154Z stepping : 7 2025-05-07T19:43:01.4793366Z microcode : 0x5003901 2025-05-07T19:43:01.4793587Z cpu MHz : 3085.253 2025-05-07T19:43:01.4793808Z cache size : 36608 KB 2025-05-07T19:43:01.4794044Z physical id : 1 2025-05-07T19:43:01.4794244Z siblings : 48 2025-05-07T19:43:01.4794453Z core id : 8 2025-05-07T19:43:01.4794628Z cpu cores : 24 2025-05-07T19:43:01.4794832Z apicid : 80 2025-05-07T19:43:01.4795017Z initial apicid : 80 2025-05-07T19:43:01.4795228Z fpu : yes 2025-05-07T19:43:01.4795411Z fpu_exception : yes 2025-05-07T19:43:01.4795623Z cpuid level : 13 2025-05-07T19:43:01.4795813Z wp : yes 2025-05-07T19:43:01.4798203Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:01.4801362Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:01.4802069Z bogomips : 5999.99 2025-05-07T19:43:01.4802299Z clflush size : 64 2025-05-07T19:43:01.4802521Z cache_alignment : 64 2025-05-07T19:43:01.4802798Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:01.4803136Z power management: 2025-05-07T19:43:01.4803270Z 2025-05-07T19:43:01.4803353Z processor : 33 2025-05-07T19:43:01.4803572Z vendor_id : GenuineIntel 2025-05-07T19:43:01.4803807Z cpu family : 6 2025-05-07T19:43:01.4804017Z model : 85 2025-05-07T19:43:01.4804291Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:01.4804654Z stepping : 7 2025-05-07T19:43:01.4804862Z microcode : 0x5003901 2025-05-07T19:43:01.4805097Z cpu MHz : 2999.998 2025-05-07T19:43:01.4805321Z cache size : 36608 KB 2025-05-07T19:43:01.4805538Z physical id : 1 2025-05-07T19:43:01.4805765Z siblings : 48 2025-05-07T19:43:01.4805957Z core id : 9 2025-05-07T19:43:01.4806164Z cpu cores : 24 2025-05-07T19:43:01.4806356Z apicid : 82 2025-05-07T19:43:01.4806570Z initial apicid : 82 2025-05-07T19:43:01.4806785Z fpu : yes 2025-05-07T19:43:01.4806990Z fpu_exception : yes 2025-05-07T19:43:01.4807200Z cpuid level : 13 2025-05-07T19:43:01.4807414Z wp : yes 2025-05-07T19:43:01.4809891Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:01.4812670Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:01.4813389Z bogomips : 5999.99 2025-05-07T19:43:01.4813611Z clflush size : 64 2025-05-07T19:43:01.4813823Z cache_alignment : 64 2025-05-07T19:43:01.4814099Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:01.4814417Z power management: 2025-05-07T19:43:01.4814544Z 2025-05-07T19:43:01.4814634Z processor : 34 2025-05-07T19:43:01.4814842Z vendor_id : GenuineIntel 2025-05-07T19:43:01.4815082Z cpu family : 6 2025-05-07T19:43:01.4815270Z model : 85 2025-05-07T19:43:01.4815545Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:01.4815886Z stepping : 7 2025-05-07T19:43:01.4816095Z microcode : 0x5003901 2025-05-07T19:43:01.4816310Z cpu MHz : 3788.366 2025-05-07T19:43:01.4816527Z cache size : 36608 KB 2025-05-07T19:43:01.4816756Z physical id : 1 2025-05-07T19:43:01.4816951Z siblings : 48 2025-05-07T19:43:01.4817160Z core id : 10 2025-05-07T19:43:01.4817353Z cpu cores : 24 2025-05-07T19:43:01.4817677Z apicid : 84 2025-05-07T19:43:01.4817864Z initial apicid : 84 2025-05-07T19:43:01.4818069Z fpu : yes 2025-05-07T19:43:01.4818250Z fpu_exception : yes 2025-05-07T19:43:01.4818335Z cpuid level : 13 2025-05-07T19:43:01.4818405Z wp : yes 2025-05-07T19:43:01.4820759Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:01.4821179Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:01.4821268Z bogomips : 5999.99 2025-05-07T19:43:01.4821350Z clflush size : 64 2025-05-07T19:43:01.4821503Z cache_alignment : 64 2025-05-07T19:43:01.4821634Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:01.4821720Z power management: 2025-05-07T19:43:01.4821724Z 2025-05-07T19:43:01.4821806Z processor : 35 2025-05-07T19:43:01.4821910Z vendor_id : GenuineIntel 2025-05-07T19:43:01.4821989Z cpu family : 6 2025-05-07T19:43:01.4822069Z model : 85 2025-05-07T19:43:01.4822244Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:01.4822326Z stepping : 7 2025-05-07T19:43:01.4822413Z microcode : 0x5003901 2025-05-07T19:43:01.4822494Z cpu MHz : 3165.258 2025-05-07T19:43:01.4822592Z cache size : 36608 KB 2025-05-07T19:43:01.4822675Z physical id : 1 2025-05-07T19:43:01.4822753Z siblings : 48 2025-05-07T19:43:01.4822841Z core id : 11 2025-05-07T19:43:01.4822919Z cpu cores : 24 2025-05-07T19:43:01.4823000Z apicid : 86 2025-05-07T19:43:01.4823090Z initial apicid : 86 2025-05-07T19:43:01.4823177Z fpu : yes 2025-05-07T19:43:01.4823262Z fpu_exception : yes 2025-05-07T19:43:01.4823344Z cpuid level : 13 2025-05-07T19:43:01.4823429Z wp : yes 2025-05-07T19:43:01.4825630Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:01.4826082Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:01.4826178Z bogomips : 5999.99 2025-05-07T19:43:01.4826264Z clflush size : 64 2025-05-07T19:43:01.4826350Z cache_alignment : 64 2025-05-07T19:43:01.4826495Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:01.4826581Z power management: 2025-05-07T19:43:01.4826585Z 2025-05-07T19:43:01.4826667Z processor : 36 2025-05-07T19:43:01.4826757Z vendor_id : GenuineIntel 2025-05-07T19:43:01.4826848Z cpu family : 6 2025-05-07T19:43:01.4826926Z model : 85 2025-05-07T19:43:01.4827085Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:01.4827176Z stepping : 7 2025-05-07T19:43:01.4827260Z microcode : 0x5003901 2025-05-07T19:43:01.4827341Z cpu MHz : 2999.998 2025-05-07T19:43:01.4827425Z cache size : 36608 KB 2025-05-07T19:43:01.4827517Z physical id : 1 2025-05-07T19:43:01.4827596Z siblings : 48 2025-05-07T19:43:01.4827675Z core id : 12 2025-05-07T19:43:01.4827771Z cpu cores : 24 2025-05-07T19:43:01.4827847Z apicid : 88 2025-05-07T19:43:01.4827935Z initial apicid : 88 2025-05-07T19:43:01.4828013Z fpu : yes 2025-05-07T19:43:01.4828113Z fpu_exception : yes 2025-05-07T19:43:01.4828193Z cpuid level : 13 2025-05-07T19:43:01.4828272Z wp : yes 2025-05-07T19:43:01.4830506Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:01.4830904Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:01.4830993Z bogomips : 5999.99 2025-05-07T19:43:01.4831089Z clflush size : 64 2025-05-07T19:43:01.4831176Z cache_alignment : 64 2025-05-07T19:43:01.4831306Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:01.4831451Z power management: 2025-05-07T19:43:01.4831456Z 2025-05-07T19:43:01.4831537Z processor : 37 2025-05-07T19:43:01.4831625Z vendor_id : GenuineIntel 2025-05-07T19:43:01.4831705Z cpu family : 6 2025-05-07T19:43:01.4831789Z model : 85 2025-05-07T19:43:01.4831949Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:01.4832028Z stepping : 7 2025-05-07T19:43:01.4832239Z microcode : 0x5003901 2025-05-07T19:43:01.4832315Z cpu MHz : 2999.998 2025-05-07T19:43:01.4832391Z cache size : 36608 KB 2025-05-07T19:43:01.4832465Z physical id : 1 2025-05-07T19:43:01.4832545Z siblings : 48 2025-05-07T19:43:01.4832616Z core id : 13 2025-05-07T19:43:01.4832690Z cpu cores : 24 2025-05-07T19:43:01.4832772Z apicid : 90 2025-05-07T19:43:01.4832853Z initial apicid : 90 2025-05-07T19:43:01.4832925Z fpu : yes 2025-05-07T19:43:01.4833005Z fpu_exception : yes 2025-05-07T19:43:01.4833094Z cpuid level : 13 2025-05-07T19:43:01.4833164Z wp : yes 2025-05-07T19:43:01.4835222Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:01.4835599Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:01.4835675Z bogomips : 5999.99 2025-05-07T19:43:01.4835802Z clflush size : 64 2025-05-07T19:43:01.4835892Z cache_alignment : 64 2025-05-07T19:43:01.4836015Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:01.4836096Z power management: 2025-05-07T19:43:01.4836103Z 2025-05-07T19:43:01.4836188Z processor : 38 2025-05-07T19:43:01.4836276Z vendor_id : GenuineIntel 2025-05-07T19:43:01.4836354Z cpu family : 6 2025-05-07T19:43:01.4836427Z model : 85 2025-05-07T19:43:01.4836590Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:01.4836668Z stepping : 7 2025-05-07T19:43:01.4836750Z microcode : 0x5003901 2025-05-07T19:43:01.4836835Z cpu MHz : 2999.998 2025-05-07T19:43:01.4836912Z cache size : 36608 KB 2025-05-07T19:43:01.4836991Z physical id : 1 2025-05-07T19:43:01.4837067Z siblings : 48 2025-05-07T19:43:01.4837155Z core id : 14 2025-05-07T19:43:01.4837231Z cpu cores : 24 2025-05-07T19:43:01.4837314Z apicid : 92 2025-05-07T19:43:01.4837416Z initial apicid : 92 2025-05-07T19:43:01.4837498Z fpu : yes 2025-05-07T19:43:01.4837585Z fpu_exception : yes 2025-05-07T19:43:01.4837668Z cpuid level : 13 2025-05-07T19:43:01.4837764Z wp : yes 2025-05-07T19:43:01.4839803Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:01.4840195Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:01.4840273Z bogomips : 5999.99 2025-05-07T19:43:01.4840351Z clflush size : 64 2025-05-07T19:43:01.4840438Z cache_alignment : 64 2025-05-07T19:43:01.4840582Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:01.4840662Z power management: 2025-05-07T19:43:01.4840666Z 2025-05-07T19:43:01.4840787Z processor : 39 2025-05-07T19:43:01.4840893Z vendor_id : GenuineIntel 2025-05-07T19:43:01.4840972Z cpu family : 6 2025-05-07T19:43:01.4841050Z model : 85 2025-05-07T19:43:01.4841208Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:01.4841303Z stepping : 7 2025-05-07T19:43:01.4841383Z microcode : 0x5003901 2025-05-07T19:43:01.4841463Z cpu MHz : 2999.998 2025-05-07T19:43:01.4841559Z cache size : 36608 KB 2025-05-07T19:43:01.4841637Z physical id : 1 2025-05-07T19:43:01.4841712Z siblings : 48 2025-05-07T19:43:01.4841783Z core id : 15 2025-05-07T19:43:01.4841870Z cpu cores : 24 2025-05-07T19:43:01.4841945Z apicid : 94 2025-05-07T19:43:01.4842025Z initial apicid : 94 2025-05-07T19:43:01.4842099Z fpu : yes 2025-05-07T19:43:01.4842192Z fpu_exception : yes 2025-05-07T19:43:01.4842271Z cpuid level : 13 2025-05-07T19:43:01.4842344Z wp : yes 2025-05-07T19:43:01.4844406Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:01.4844776Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:01.4844858Z bogomips : 5999.99 2025-05-07T19:43:01.4844939Z clflush size : 64 2025-05-07T19:43:01.4845015Z cache_alignment : 64 2025-05-07T19:43:01.4845177Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:01.4845259Z power management: 2025-05-07T19:43:01.4845263Z 2025-05-07T19:43:01.4845333Z processor : 40 2025-05-07T19:43:01.4845415Z vendor_id : GenuineIntel 2025-05-07T19:43:01.4845498Z cpu family : 6 2025-05-07T19:43:01.4845569Z model : 85 2025-05-07T19:43:01.4845718Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:01.4845789Z stepping : 7 2025-05-07T19:43:01.4845880Z microcode : 0x5003901 2025-05-07T19:43:01.4845953Z cpu MHz : 2999.998 2025-05-07T19:43:01.4846031Z cache size : 36608 KB 2025-05-07T19:43:01.4846121Z physical id : 1 2025-05-07T19:43:01.4846195Z siblings : 48 2025-05-07T19:43:01.4846272Z core id : 16 2025-05-07T19:43:01.4846345Z cpu cores : 24 2025-05-07T19:43:01.4846441Z apicid : 96 2025-05-07T19:43:01.4846518Z initial apicid : 96 2025-05-07T19:43:01.4846594Z fpu : yes 2025-05-07T19:43:01.4846681Z fpu_exception : yes 2025-05-07T19:43:01.4846785Z cpuid level : 13 2025-05-07T19:43:01.4846858Z wp : yes 2025-05-07T19:43:01.4848907Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:01.4849297Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:01.4849381Z bogomips : 5999.99 2025-05-07T19:43:01.4849462Z clflush size : 64 2025-05-07T19:43:01.4849561Z cache_alignment : 64 2025-05-07T19:43:01.4849690Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:01.4849774Z power management: 2025-05-07T19:43:01.4849779Z 2025-05-07T19:43:01.4849880Z processor : 41 2025-05-07T19:43:01.4849970Z vendor_id : GenuineIntel 2025-05-07T19:43:01.4850103Z cpu family : 6 2025-05-07T19:43:01.4850178Z model : 85 2025-05-07T19:43:01.4850354Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:01.4850437Z stepping : 7 2025-05-07T19:43:01.4850517Z microcode : 0x5003901 2025-05-07T19:43:01.4850606Z cpu MHz : 2999.998 2025-05-07T19:43:01.4850685Z cache size : 36608 KB 2025-05-07T19:43:01.4850767Z physical id : 1 2025-05-07T19:43:01.4850840Z siblings : 48 2025-05-07T19:43:01.4850927Z core id : 17 2025-05-07T19:43:01.4851002Z cpu cores : 24 2025-05-07T19:43:01.4851080Z apicid : 98 2025-05-07T19:43:01.4851174Z initial apicid : 98 2025-05-07T19:43:01.4851249Z fpu : yes 2025-05-07T19:43:01.4851330Z fpu_exception : yes 2025-05-07T19:43:01.4851407Z cpuid level : 13 2025-05-07T19:43:01.4851486Z wp : yes 2025-05-07T19:43:01.4853532Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:01.4853911Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:01.4853988Z bogomips : 5999.99 2025-05-07T19:43:01.4854065Z clflush size : 64 2025-05-07T19:43:01.4854142Z cache_alignment : 64 2025-05-07T19:43:01.4854273Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:01.4854353Z power management: 2025-05-07T19:43:01.4854418Z 2025-05-07T19:43:01.4854495Z processor : 42 2025-05-07T19:43:01.4854587Z vendor_id : GenuineIntel 2025-05-07T19:43:01.4854661Z cpu family : 6 2025-05-07T19:43:01.4854738Z model : 85 2025-05-07T19:43:01.4854889Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:01.4854970Z stepping : 7 2025-05-07T19:43:01.4855053Z microcode : 0x5003901 2025-05-07T19:43:01.4855132Z cpu MHz : 2999.998 2025-05-07T19:43:01.4855220Z cache size : 36608 KB 2025-05-07T19:43:01.4855306Z physical id : 1 2025-05-07T19:43:01.4855382Z siblings : 48 2025-05-07T19:43:01.4855458Z core id : 18 2025-05-07T19:43:01.4855546Z cpu cores : 24 2025-05-07T19:43:01.4855621Z apicid : 100 2025-05-07T19:43:01.4855704Z initial apicid : 100 2025-05-07T19:43:01.4855786Z fpu : yes 2025-05-07T19:43:01.4855865Z fpu_exception : yes 2025-05-07T19:43:01.4855939Z cpuid level : 13 2025-05-07T19:43:01.4856010Z wp : yes 2025-05-07T19:43:01.4858343Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:01.4858770Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:01.4858897Z bogomips : 5999.99 2025-05-07T19:43:01.4858980Z clflush size : 64 2025-05-07T19:43:01.4859065Z cache_alignment : 64 2025-05-07T19:43:01.4859195Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:01.4859292Z power management: 2025-05-07T19:43:01.4859297Z 2025-05-07T19:43:01.4859380Z processor : 43 2025-05-07T19:43:01.4859546Z vendor_id : GenuineIntel 2025-05-07T19:43:01.4859643Z cpu family : 6 2025-05-07T19:43:01.4859719Z model : 85 2025-05-07T19:43:01.4860115Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:01.4860195Z stepping : 7 2025-05-07T19:43:01.4860295Z microcode : 0x5003901 2025-05-07T19:43:01.4860375Z cpu MHz : 2999.998 2025-05-07T19:43:01.4860456Z cache size : 36608 KB 2025-05-07T19:43:01.4860554Z physical id : 1 2025-05-07T19:43:01.4860638Z siblings : 48 2025-05-07T19:43:01.4860715Z core id : 19 2025-05-07T19:43:01.4860792Z cpu cores : 24 2025-05-07T19:43:01.4860883Z apicid : 102 2025-05-07T19:43:01.4860969Z initial apicid : 102 2025-05-07T19:43:01.4861042Z fpu : yes 2025-05-07T19:43:01.4861142Z fpu_exception : yes 2025-05-07T19:43:01.4861224Z cpuid level : 13 2025-05-07T19:43:01.4861301Z wp : yes 2025-05-07T19:43:01.4863548Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:01.4863949Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:01.4864030Z bogomips : 5999.99 2025-05-07T19:43:01.4864125Z clflush size : 64 2025-05-07T19:43:01.4864210Z cache_alignment : 64 2025-05-07T19:43:01.4864338Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:01.4864423Z power management: 2025-05-07T19:43:01.4864427Z 2025-05-07T19:43:01.4864522Z processor : 44 2025-05-07T19:43:01.4865423Z vendor_id : GenuineIntel 2025-05-07T19:43:01.4865518Z cpu family : 6 2025-05-07T19:43:01.4865608Z model : 85 2025-05-07T19:43:01.4865775Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:01.4865857Z stepping : 7 2025-05-07T19:43:01.4865941Z microcode : 0x5003901 2025-05-07T19:43:01.4866036Z cpu MHz : 2999.998 2025-05-07T19:43:01.4866121Z cache size : 36608 KB 2025-05-07T19:43:01.4866203Z physical id : 1 2025-05-07T19:43:01.4866294Z siblings : 48 2025-05-07T19:43:01.4866371Z core id : 20 2025-05-07T19:43:01.4866454Z cpu cores : 24 2025-05-07T19:43:01.4866533Z apicid : 104 2025-05-07T19:43:01.4866636Z initial apicid : 104 2025-05-07T19:43:01.4866713Z fpu : yes 2025-05-07T19:43:01.4866800Z fpu_exception : yes 2025-05-07T19:43:01.4866891Z cpuid level : 13 2025-05-07T19:43:01.4866967Z wp : yes 2025-05-07T19:43:01.4869193Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:01.4869598Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:01.4869684Z bogomips : 5999.99 2025-05-07T19:43:01.4869769Z clflush size : 64 2025-05-07T19:43:01.4869870Z cache_alignment : 64 2025-05-07T19:43:01.4870001Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:01.4870088Z power management: 2025-05-07T19:43:01.4870093Z 2025-05-07T19:43:01.4870176Z processor : 45 2025-05-07T19:43:01.4870275Z vendor_id : GenuineIntel 2025-05-07T19:43:01.4870360Z cpu family : 6 2025-05-07T19:43:01.4870439Z model : 85 2025-05-07T19:43:01.4870611Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:01.4870689Z stepping : 7 2025-05-07T19:43:01.4870834Z microcode : 0x5003901 2025-05-07T19:43:01.4870913Z cpu MHz : 3149.150 2025-05-07T19:43:01.4871005Z cache size : 36608 KB 2025-05-07T19:43:01.4871085Z physical id : 1 2025-05-07T19:43:01.4871162Z siblings : 48 2025-05-07T19:43:01.4871261Z core id : 21 2025-05-07T19:43:01.4871338Z cpu cores : 24 2025-05-07T19:43:01.4871417Z apicid : 106 2025-05-07T19:43:01.4871501Z initial apicid : 106 2025-05-07T19:43:01.4871592Z fpu : yes 2025-05-07T19:43:01.4871680Z fpu_exception : yes 2025-05-07T19:43:01.4871763Z cpuid level : 13 2025-05-07T19:43:01.4871841Z wp : yes 2025-05-07T19:43:01.4874048Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:01.4874415Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:01.4874503Z bogomips : 5999.99 2025-05-07T19:43:01.4874582Z clflush size : 64 2025-05-07T19:43:01.4874665Z cache_alignment : 64 2025-05-07T19:43:01.4874800Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:01.4874878Z power management: 2025-05-07T19:43:01.4874882Z 2025-05-07T19:43:01.4874959Z processor : 46 2025-05-07T19:43:01.4875046Z vendor_id : GenuineIntel 2025-05-07T19:43:01.4875137Z cpu family : 6 2025-05-07T19:43:01.4875212Z model : 85 2025-05-07T19:43:01.4875411Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:01.4875496Z stepping : 7 2025-05-07T19:43:01.4875580Z microcode : 0x5003901 2025-05-07T19:43:01.4875662Z cpu MHz : 3132.525 2025-05-07T19:43:01.4875739Z cache size : 36608 KB 2025-05-07T19:43:01.4875829Z physical id : 1 2025-05-07T19:43:01.4875909Z siblings : 48 2025-05-07T19:43:01.4875983Z core id : 22 2025-05-07T19:43:01.4876068Z cpu cores : 24 2025-05-07T19:43:01.4876139Z apicid : 108 2025-05-07T19:43:01.4876223Z initial apicid : 108 2025-05-07T19:43:01.4876300Z fpu : yes 2025-05-07T19:43:01.4876400Z fpu_exception : yes 2025-05-07T19:43:01.4876474Z cpuid level : 13 2025-05-07T19:43:01.4876549Z wp : yes 2025-05-07T19:43:01.4879116Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:01.4879566Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:01.4879649Z bogomips : 5999.99 2025-05-07T19:43:01.4879735Z clflush size : 64 2025-05-07T19:43:01.4879820Z cache_alignment : 64 2025-05-07T19:43:01.4879947Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:01.4880036Z power management: 2025-05-07T19:43:01.4880040Z 2025-05-07T19:43:01.4880119Z processor : 47 2025-05-07T19:43:01.4880209Z vendor_id : GenuineIntel 2025-05-07T19:43:01.4880287Z cpu family : 6 2025-05-07T19:43:01.4880371Z model : 85 2025-05-07T19:43:01.4880531Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:01.4880614Z stepping : 7 2025-05-07T19:43:01.4880702Z microcode : 0x5003901 2025-05-07T19:43:01.4880776Z cpu MHz : 3132.610 2025-05-07T19:43:01.4880912Z cache size : 36608 KB 2025-05-07T19:43:01.4880990Z physical id : 1 2025-05-07T19:43:01.4881077Z siblings : 48 2025-05-07T19:43:01.4881150Z core id : 23 2025-05-07T19:43:01.4881223Z cpu cores : 24 2025-05-07T19:43:01.4881302Z apicid : 110 2025-05-07T19:43:01.4881399Z initial apicid : 110 2025-05-07T19:43:01.4881470Z fpu : yes 2025-05-07T19:43:01.4881551Z fpu_exception : yes 2025-05-07T19:43:01.4881646Z cpuid level : 13 2025-05-07T19:43:01.4881720Z wp : yes 2025-05-07T19:43:01.4883879Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:01.4884277Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:01.4884356Z bogomips : 5999.99 2025-05-07T19:43:01.4884436Z clflush size : 64 2025-05-07T19:43:01.4884533Z cache_alignment : 64 2025-05-07T19:43:01.4884658Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:01.4884741Z power management: 2025-05-07T19:43:01.4884745Z 2025-05-07T19:43:01.4884827Z processor : 48 2025-05-07T19:43:01.4884914Z vendor_id : GenuineIntel 2025-05-07T19:43:01.4884990Z cpu family : 6 2025-05-07T19:43:01.4885061Z model : 85 2025-05-07T19:43:01.4885235Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:01.4885362Z stepping : 7 2025-05-07T19:43:01.4885445Z microcode : 0x5003901 2025-05-07T19:43:01.4885529Z cpu MHz : 2999.998 2025-05-07T19:43:01.4885607Z cache size : 36608 KB 2025-05-07T19:43:01.4885692Z physical id : 0 2025-05-07T19:43:01.4885771Z siblings : 48 2025-05-07T19:43:01.4885854Z core id : 0 2025-05-07T19:43:01.4885932Z cpu cores : 24 2025-05-07T19:43:01.4886009Z apicid : 1 2025-05-07T19:43:01.4886092Z initial apicid : 1 2025-05-07T19:43:01.4886170Z fpu : yes 2025-05-07T19:43:01.4886256Z fpu_exception : yes 2025-05-07T19:43:01.4886334Z cpuid level : 13 2025-05-07T19:43:01.4886424Z wp : yes 2025-05-07T19:43:01.4888590Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:01.4888983Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:01.4889073Z bogomips : 5999.99 2025-05-07T19:43:01.4889157Z clflush size : 64 2025-05-07T19:43:01.4889242Z cache_alignment : 64 2025-05-07T19:43:01.4889379Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:01.4889467Z power management: 2025-05-07T19:43:01.4889472Z 2025-05-07T19:43:01.4889556Z processor : 49 2025-05-07T19:43:01.4889652Z vendor_id : GenuineIntel 2025-05-07T19:43:01.4889726Z cpu family : 6 2025-05-07T19:43:01.4889801Z model : 85 2025-05-07T19:43:01.4889960Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:01.4890051Z stepping : 7 2025-05-07T19:43:01.4890138Z microcode : 0x5003901 2025-05-07T19:43:01.4890217Z cpu MHz : 2999.998 2025-05-07T19:43:01.4890295Z cache size : 36608 KB 2025-05-07T19:43:01.4890380Z physical id : 0 2025-05-07T19:43:01.4890524Z siblings : 48 2025-05-07T19:43:01.4890596Z core id : 1 2025-05-07T19:43:01.4890678Z cpu cores : 24 2025-05-07T19:43:01.4890750Z apicid : 3 2025-05-07T19:43:01.4890944Z initial apicid : 3 2025-05-07T19:43:01.4891013Z fpu : yes 2025-05-07T19:43:01.4891097Z fpu_exception : yes 2025-05-07T19:43:01.4891172Z cpuid level : 13 2025-05-07T19:43:01.4891249Z wp : yes 2025-05-07T19:43:01.4893328Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:01.4893697Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:01.4893775Z bogomips : 5999.99 2025-05-07T19:43:01.4893863Z clflush size : 64 2025-05-07T19:43:01.4893945Z cache_alignment : 64 2025-05-07T19:43:01.4894071Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:01.4894160Z power management: 2025-05-07T19:43:01.4894164Z 2025-05-07T19:43:01.4894239Z processor : 50 2025-05-07T19:43:01.4894322Z vendor_id : GenuineIntel 2025-05-07T19:43:01.4894397Z cpu family : 6 2025-05-07T19:43:01.4894474Z model : 85 2025-05-07T19:43:01.4894625Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:01.4894701Z stepping : 7 2025-05-07T19:43:01.4894788Z microcode : 0x5003901 2025-05-07T19:43:01.4894914Z cpu MHz : 1250.432 2025-05-07T19:43:01.4894991Z cache size : 36608 KB 2025-05-07T19:43:01.4895066Z physical id : 0 2025-05-07T19:43:01.4895155Z siblings : 48 2025-05-07T19:43:01.4895226Z core id : 2 2025-05-07T19:43:01.4895297Z cpu cores : 24 2025-05-07T19:43:01.4895383Z apicid : 5 2025-05-07T19:43:01.4895462Z initial apicid : 5 2025-05-07T19:43:01.4895537Z fpu : yes 2025-05-07T19:43:01.4895613Z fpu_exception : yes 2025-05-07T19:43:01.4895696Z cpuid level : 13 2025-05-07T19:43:01.4895769Z wp : yes 2025-05-07T19:43:01.4897802Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:01.4898334Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:01.4898591Z bogomips : 5999.99 2025-05-07T19:43:01.4898668Z clflush size : 64 2025-05-07T19:43:01.4898767Z cache_alignment : 64 2025-05-07T19:43:01.4898924Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:01.4899005Z power management: 2025-05-07T19:43:01.4899010Z 2025-05-07T19:43:01.4899097Z processor : 51 2025-05-07T19:43:01.4899184Z vendor_id : GenuineIntel 2025-05-07T19:43:01.4899263Z cpu family : 6 2025-05-07T19:43:01.4899337Z model : 85 2025-05-07T19:43:01.4899593Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:01.4899677Z stepping : 7 2025-05-07T19:43:01.4899759Z microcode : 0x5003901 2025-05-07T19:43:01.4899850Z cpu MHz : 2999.998 2025-05-07T19:43:01.4900108Z cache size : 36608 KB 2025-05-07T19:43:01.4900193Z physical id : 0 2025-05-07T19:43:01.4900457Z siblings : 48 2025-05-07T19:43:01.4900554Z core id : 3 2025-05-07T19:43:01.4900636Z cpu cores : 24 2025-05-07T19:43:01.4900854Z apicid : 7 2025-05-07T19:43:01.4900952Z initial apicid : 7 2025-05-07T19:43:01.4901031Z fpu : yes 2025-05-07T19:43:01.4901118Z fpu_exception : yes 2025-05-07T19:43:01.4901201Z cpuid level : 13 2025-05-07T19:43:01.4901291Z wp : yes 2025-05-07T19:43:01.4903518Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:01.4903931Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:01.4904020Z bogomips : 5999.99 2025-05-07T19:43:01.4904100Z clflush size : 64 2025-05-07T19:43:01.4904186Z cache_alignment : 64 2025-05-07T19:43:01.4904326Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:01.4904411Z power management: 2025-05-07T19:43:01.4904416Z 2025-05-07T19:43:01.4904498Z processor : 52 2025-05-07T19:43:01.4904597Z vendor_id : GenuineIntel 2025-05-07T19:43:01.4904676Z cpu family : 6 2025-05-07T19:43:01.4904757Z model : 85 2025-05-07T19:43:01.4904923Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:01.4905013Z stepping : 7 2025-05-07T19:43:01.4905098Z microcode : 0x5003901 2025-05-07T19:43:01.4905181Z cpu MHz : 2999.998 2025-05-07T19:43:01.4905274Z cache size : 36608 KB 2025-05-07T19:43:01.4905429Z physical id : 0 2025-05-07T19:43:01.4905509Z siblings : 48 2025-05-07T19:43:01.4905587Z core id : 4 2025-05-07T19:43:01.4905675Z cpu cores : 24 2025-05-07T19:43:01.4905753Z apicid : 9 2025-05-07T19:43:01.4905841Z initial apicid : 9 2025-05-07T19:43:01.4905926Z fpu : yes 2025-05-07T19:43:01.4906012Z fpu_exception : yes 2025-05-07T19:43:01.4906095Z cpuid level : 13 2025-05-07T19:43:01.4906172Z wp : yes 2025-05-07T19:43:01.4908397Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:01.4908795Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:01.4908895Z bogomips : 5999.99 2025-05-07T19:43:01.4908979Z clflush size : 64 2025-05-07T19:43:01.4909065Z cache_alignment : 64 2025-05-07T19:43:01.4909197Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:01.4909300Z power management: 2025-05-07T19:43:01.4909304Z 2025-05-07T19:43:01.4909387Z processor : 53 2025-05-07T19:43:01.4909477Z vendor_id : GenuineIntel 2025-05-07T19:43:01.4909571Z cpu family : 6 2025-05-07T19:43:01.4909651Z model : 85 2025-05-07T19:43:01.4909814Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:01.4909895Z stepping : 7 2025-05-07T19:43:01.4909992Z microcode : 0x5003901 2025-05-07T19:43:01.4910072Z cpu MHz : 1190.582 2025-05-07T19:43:01.4910155Z cache size : 36608 KB 2025-05-07T19:43:01.4910250Z physical id : 0 2025-05-07T19:43:01.4910333Z siblings : 48 2025-05-07T19:43:01.4910419Z core id : 5 2025-05-07T19:43:01.4910498Z cpu cores : 24 2025-05-07T19:43:01.4910592Z apicid : 11 2025-05-07T19:43:01.4910680Z initial apicid : 11 2025-05-07T19:43:01.4910815Z fpu : yes 2025-05-07T19:43:01.4910900Z fpu_exception : yes 2025-05-07T19:43:01.4910994Z cpuid level : 13 2025-05-07T19:43:01.4911073Z wp : yes 2025-05-07T19:43:01.4913323Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:01.4913706Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:01.4913786Z bogomips : 5999.99 2025-05-07T19:43:01.4913871Z clflush size : 64 2025-05-07T19:43:01.4913964Z cache_alignment : 64 2025-05-07T19:43:01.4914091Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:01.4914169Z power management: 2025-05-07T19:43:01.4914173Z 2025-05-07T19:43:01.4914266Z processor : 54 2025-05-07T19:43:01.4914355Z vendor_id : GenuineIntel 2025-05-07T19:43:01.4914429Z cpu family : 6 2025-05-07T19:43:01.4914517Z model : 85 2025-05-07T19:43:01.4914671Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:01.4914751Z stepping : 7 2025-05-07T19:43:01.4914828Z microcode : 0x5003901 2025-05-07T19:43:01.4914918Z cpu MHz : 2999.998 2025-05-07T19:43:01.4914997Z cache size : 36608 KB 2025-05-07T19:43:01.4915078Z physical id : 0 2025-05-07T19:43:01.4915153Z siblings : 48 2025-05-07T19:43:01.4915241Z core id : 6 2025-05-07T19:43:01.4915368Z cpu cores : 24 2025-05-07T19:43:01.4915442Z apicid : 13 2025-05-07T19:43:01.4915529Z initial apicid : 13 2025-05-07T19:43:01.4915602Z fpu : yes 2025-05-07T19:43:01.4915691Z fpu_exception : yes 2025-05-07T19:43:01.4915766Z cpuid level : 13 2025-05-07T19:43:01.4915847Z wp : yes 2025-05-07T19:43:01.4917904Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:01.4918292Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:01.4918368Z bogomips : 5999.99 2025-05-07T19:43:01.4918446Z clflush size : 64 2025-05-07T19:43:01.4918530Z cache_alignment : 64 2025-05-07T19:43:01.4918664Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:01.4918742Z power management: 2025-05-07T19:43:01.4918746Z 2025-05-07T19:43:01.4918820Z processor : 55 2025-05-07T19:43:01.4918916Z vendor_id : GenuineIntel 2025-05-07T19:43:01.4918988Z cpu family : 6 2025-05-07T19:43:01.4919056Z model : 85 2025-05-07T19:43:01.4919209Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:01.4919293Z stepping : 7 2025-05-07T19:43:01.4919369Z microcode : 0x5003901 2025-05-07T19:43:01.4919443Z cpu MHz : 2999.998 2025-05-07T19:43:01.4919536Z cache size : 36608 KB 2025-05-07T19:43:01.4919610Z physical id : 0 2025-05-07T19:43:01.4919684Z siblings : 48 2025-05-07T19:43:01.4919759Z core id : 7 2025-05-07T19:43:01.4919846Z cpu cores : 24 2025-05-07T19:43:01.4919922Z apicid : 15 2025-05-07T19:43:01.4920001Z initial apicid : 15 2025-05-07T19:43:01.4920088Z fpu : yes 2025-05-07T19:43:01.4920167Z fpu_exception : yes 2025-05-07T19:43:01.4920291Z cpuid level : 13 2025-05-07T19:43:01.4920366Z wp : yes 2025-05-07T19:43:01.4922418Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:01.4922789Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:01.4922878Z bogomips : 5999.99 2025-05-07T19:43:01.4922953Z clflush size : 64 2025-05-07T19:43:01.4923032Z cache_alignment : 64 2025-05-07T19:43:01.4923160Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:01.4923258Z power management: 2025-05-07T19:43:01.4923262Z 2025-05-07T19:43:01.4923336Z processor : 56 2025-05-07T19:43:01.4923419Z vendor_id : GenuineIntel 2025-05-07T19:43:01.4923508Z cpu family : 6 2025-05-07T19:43:01.4923579Z model : 85 2025-05-07T19:43:01.4923728Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:01.4923802Z stepping : 7 2025-05-07T19:43:01.4923892Z microcode : 0x5003901 2025-05-07T19:43:01.4923964Z cpu MHz : 2999.998 2025-05-07T19:43:01.4924041Z cache size : 36608 KB 2025-05-07T19:43:01.4924130Z physical id : 0 2025-05-07T19:43:01.4924203Z siblings : 48 2025-05-07T19:43:01.4924275Z core id : 8 2025-05-07T19:43:01.4924349Z cpu cores : 24 2025-05-07T19:43:01.4924439Z apicid : 17 2025-05-07T19:43:01.4924562Z initial apicid : 17 2025-05-07T19:43:01.4924635Z fpu : yes 2025-05-07T19:43:01.4924723Z fpu_exception : yes 2025-05-07T19:43:01.4924793Z cpuid level : 13 2025-05-07T19:43:01.4924869Z wp : yes 2025-05-07T19:43:01.4926924Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:01.4927293Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:01.4927373Z bogomips : 5999.99 2025-05-07T19:43:01.4927462Z clflush size : 64 2025-05-07T19:43:01.4927542Z cache_alignment : 64 2025-05-07T19:43:01.4927663Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:01.4927746Z power management: 2025-05-07T19:43:01.4927749Z 2025-05-07T19:43:01.4927836Z processor : 57 2025-05-07T19:43:01.4927921Z vendor_id : GenuineIntel 2025-05-07T19:43:01.4927998Z cpu family : 6 2025-05-07T19:43:01.4928084Z model : 85 2025-05-07T19:43:01.4928237Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:01.4928317Z stepping : 7 2025-05-07T19:43:01.4928400Z microcode : 0x5003901 2025-05-07T19:43:01.4928487Z cpu MHz : 1270.309 2025-05-07T19:43:01.4928566Z cache size : 36608 KB 2025-05-07T19:43:01.4928641Z physical id : 0 2025-05-07T19:43:01.4928726Z siblings : 48 2025-05-07T19:43:01.4928799Z core id : 9 2025-05-07T19:43:01.4928872Z cpu cores : 24 2025-05-07T19:43:01.4928946Z apicid : 19 2025-05-07T19:43:01.4929034Z initial apicid : 19 2025-05-07T19:43:01.4929104Z fpu : yes 2025-05-07T19:43:01.4929187Z fpu_exception : yes 2025-05-07T19:43:01.4929275Z cpuid level : 13 2025-05-07T19:43:01.4929343Z wp : yes 2025-05-07T19:43:01.4931394Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:01.4931836Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:01.4931917Z bogomips : 5999.99 2025-05-07T19:43:01.4931993Z clflush size : 64 2025-05-07T19:43:01.4932090Z cache_alignment : 64 2025-05-07T19:43:01.4932209Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:01.4932285Z power management: 2025-05-07T19:43:01.4932292Z 2025-05-07T19:43:01.4932368Z processor : 58 2025-05-07T19:43:01.4932464Z vendor_id : GenuineIntel 2025-05-07T19:43:01.4932538Z cpu family : 6 2025-05-07T19:43:01.4932612Z model : 85 2025-05-07T19:43:01.4932779Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:01.4932853Z stepping : 7 2025-05-07T19:43:01.4932932Z microcode : 0x5003901 2025-05-07T19:43:01.4933006Z cpu MHz : 2999.998 2025-05-07T19:43:01.4933099Z cache size : 36608 KB 2025-05-07T19:43:01.4933175Z physical id : 0 2025-05-07T19:43:01.4933249Z siblings : 48 2025-05-07T19:43:01.4933335Z core id : 10 2025-05-07T19:43:01.4933408Z cpu cores : 24 2025-05-07T19:43:01.4933483Z apicid : 21 2025-05-07T19:43:01.4933565Z initial apicid : 21 2025-05-07T19:43:01.4933651Z fpu : yes 2025-05-07T19:43:01.4933729Z fpu_exception : yes 2025-05-07T19:43:01.4933849Z cpuid level : 13 2025-05-07T19:43:01.4933922Z wp : yes 2025-05-07T19:43:01.4935976Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:01.4936345Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:01.4936439Z bogomips : 5999.99 2025-05-07T19:43:01.4936520Z clflush size : 64 2025-05-07T19:43:01.4936599Z cache_alignment : 64 2025-05-07T19:43:01.4936733Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:01.4936813Z power management: 2025-05-07T19:43:01.4936818Z 2025-05-07T19:43:01.4936898Z processor : 59 2025-05-07T19:43:01.4936981Z vendor_id : GenuineIntel 2025-05-07T19:43:01.4937068Z cpu family : 6 2025-05-07T19:43:01.4937139Z model : 85 2025-05-07T19:43:01.4937291Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:01.4937380Z stepping : 7 2025-05-07T19:43:01.4937456Z microcode : 0x5003901 2025-05-07T19:43:01.4937531Z cpu MHz : 2999.998 2025-05-07T19:43:01.4937615Z cache size : 36608 KB 2025-05-07T19:43:01.4937701Z physical id : 0 2025-05-07T19:43:01.4937774Z siblings : 48 2025-05-07T19:43:01.4937848Z core id : 11 2025-05-07T19:43:01.4937931Z cpu cores : 24 2025-05-07T19:43:01.4938006Z apicid : 23 2025-05-07T19:43:01.4938087Z initial apicid : 23 2025-05-07T19:43:01.4938162Z fpu : yes 2025-05-07T19:43:01.4938249Z fpu_exception : yes 2025-05-07T19:43:01.4938326Z cpuid level : 13 2025-05-07T19:43:01.4938403Z wp : yes 2025-05-07T19:43:01.4940793Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:01.4941247Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:01.4941331Z bogomips : 5999.99 2025-05-07T19:43:01.4941421Z clflush size : 64 2025-05-07T19:43:01.4941507Z cache_alignment : 64 2025-05-07T19:43:01.4941645Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:01.4941743Z power management: 2025-05-07T19:43:01.4941748Z 2025-05-07T19:43:01.4941831Z processor : 60 2025-05-07T19:43:01.4941924Z vendor_id : GenuineIntel 2025-05-07T19:43:01.4942011Z cpu family : 6 2025-05-07T19:43:01.4942096Z model : 85 2025-05-07T19:43:01.4942260Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:01.4942342Z stepping : 7 2025-05-07T19:43:01.4942442Z microcode : 0x5003901 2025-05-07T19:43:01.4942521Z cpu MHz : 1378.115 2025-05-07T19:43:01.4942605Z cache size : 36608 KB 2025-05-07T19:43:01.4942690Z physical id : 0 2025-05-07T19:43:01.4942778Z siblings : 48 2025-05-07T19:43:01.4942860Z core id : 12 2025-05-07T19:43:01.4942954Z cpu cores : 24 2025-05-07T19:43:01.4943038Z apicid : 25 2025-05-07T19:43:01.4943143Z initial apicid : 25 2025-05-07T19:43:01.4943228Z fpu : yes 2025-05-07T19:43:01.4943320Z fpu_exception : yes 2025-05-07T19:43:01.4943421Z cpuid level : 13 2025-05-07T19:43:01.4943505Z wp : yes 2025-05-07T19:43:01.4945790Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:01.4946210Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:01.4946301Z bogomips : 5999.99 2025-05-07T19:43:01.4946390Z clflush size : 64 2025-05-07T19:43:01.4946499Z cache_alignment : 64 2025-05-07T19:43:01.4946637Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:01.4946731Z power management: 2025-05-07T19:43:01.4946736Z 2025-05-07T19:43:01.4946839Z processor : 61 2025-05-07T19:43:01.4946936Z vendor_id : GenuineIntel 2025-05-07T19:43:01.4947023Z cpu family : 6 2025-05-07T19:43:01.4947110Z model : 85 2025-05-07T19:43:01.4947299Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:01.4947387Z stepping : 7 2025-05-07T19:43:01.4947478Z microcode : 0x5003901 2025-05-07T19:43:01.4947581Z cpu MHz : 1196.672 2025-05-07T19:43:01.4947672Z cache size : 36608 KB 2025-05-07T19:43:01.4947761Z physical id : 0 2025-05-07T19:43:01.4947847Z siblings : 48 2025-05-07T19:43:01.4947953Z core id : 13 2025-05-07T19:43:01.4948039Z cpu cores : 24 2025-05-07T19:43:01.4948126Z apicid : 27 2025-05-07T19:43:01.4948218Z initial apicid : 27 2025-05-07T19:43:01.4948319Z fpu : yes 2025-05-07T19:43:01.4948411Z fpu_exception : yes 2025-05-07T19:43:01.4948499Z cpuid level : 13 2025-05-07T19:43:01.4948601Z wp : yes 2025-05-07T19:43:01.4950828Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:01.4951281Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:01.4951385Z bogomips : 5999.99 2025-05-07T19:43:01.4951476Z clflush size : 64 2025-05-07T19:43:01.4951571Z cache_alignment : 64 2025-05-07T19:43:01.4951728Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:01.4951822Z power management: 2025-05-07T19:43:01.4951827Z 2025-05-07T19:43:01.4951919Z processor : 62 2025-05-07T19:43:01.4952020Z vendor_id : GenuineIntel 2025-05-07T19:43:01.4952237Z cpu family : 6 2025-05-07T19:43:01.4952318Z model : 85 2025-05-07T19:43:01.4952477Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:01.4952574Z stepping : 7 2025-05-07T19:43:01.4952663Z microcode : 0x5003901 2025-05-07T19:43:01.4952745Z cpu MHz : 1247.620 2025-05-07T19:43:01.4952827Z cache size : 36608 KB 2025-05-07T19:43:01.4952926Z physical id : 0 2025-05-07T19:43:01.4953003Z siblings : 48 2025-05-07T19:43:01.4953080Z core id : 14 2025-05-07T19:43:01.4953176Z cpu cores : 24 2025-05-07T19:43:01.4953253Z apicid : 29 2025-05-07T19:43:01.4953336Z initial apicid : 29 2025-05-07T19:43:01.4953411Z fpu : yes 2025-05-07T19:43:01.4953510Z fpu_exception : yes 2025-05-07T19:43:01.4953590Z cpuid level : 13 2025-05-07T19:43:01.4953664Z wp : yes 2025-05-07T19:43:01.4955800Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:01.4956176Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:01.4956259Z bogomips : 5999.99 2025-05-07T19:43:01.4956352Z clflush size : 64 2025-05-07T19:43:01.4956437Z cache_alignment : 64 2025-05-07T19:43:01.4956564Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:01.4956666Z power management: 2025-05-07T19:43:01.4956670Z 2025-05-07T19:43:01.4956750Z processor : 63 2025-05-07T19:43:01.4956840Z vendor_id : GenuineIntel 2025-05-07T19:43:01.4956918Z cpu family : 6 2025-05-07T19:43:01.4957008Z model : 85 2025-05-07T19:43:01.4957163Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:01.4957245Z stepping : 7 2025-05-07T19:43:01.4957343Z microcode : 0x5003901 2025-05-07T19:43:01.4957422Z cpu MHz : 1200.099 2025-05-07T19:43:01.4957505Z cache size : 36608 KB 2025-05-07T19:43:01.4957591Z physical id : 0 2025-05-07T19:43:01.4957687Z siblings : 48 2025-05-07T19:43:01.4957769Z core id : 15 2025-05-07T19:43:01.4957852Z cpu cores : 24 2025-05-07T19:43:01.4957948Z apicid : 31 2025-05-07T19:43:01.4958032Z initial apicid : 31 2025-05-07T19:43:01.4958111Z fpu : yes 2025-05-07T19:43:01.4958196Z fpu_exception : yes 2025-05-07T19:43:01.4958289Z cpuid level : 13 2025-05-07T19:43:01.4958364Z wp : yes 2025-05-07T19:43:01.4960416Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:01.4960857Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:01.4960939Z bogomips : 5999.99 2025-05-07T19:43:01.4961020Z clflush size : 64 2025-05-07T19:43:01.4961121Z cache_alignment : 64 2025-05-07T19:43:01.4961247Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:01.4961330Z power management: 2025-05-07T19:43:01.4961334Z 2025-05-07T19:43:01.4961428Z processor : 64 2025-05-07T19:43:01.4961516Z vendor_id : GenuineIntel 2025-05-07T19:43:01.4961599Z cpu family : 6 2025-05-07T19:43:01.4961676Z model : 85 2025-05-07T19:43:01.4961843Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:01.4961981Z stepping : 7 2025-05-07T19:43:01.4962065Z microcode : 0x5003901 2025-05-07T19:43:01.4962160Z cpu MHz : 2999.998 2025-05-07T19:43:01.4962242Z cache size : 36608 KB 2025-05-07T19:43:01.4962323Z physical id : 0 2025-05-07T19:43:01.4962573Z siblings : 48 2025-05-07T19:43:01.4962670Z core id : 16 2025-05-07T19:43:01.4962752Z cpu cores : 24 2025-05-07T19:43:01.4962834Z apicid : 33 2025-05-07T19:43:01.4962937Z initial apicid : 33 2025-05-07T19:43:01.4963016Z fpu : yes 2025-05-07T19:43:01.4963106Z fpu_exception : yes 2025-05-07T19:43:01.4963189Z cpuid level : 13 2025-05-07T19:43:01.4963284Z wp : yes 2025-05-07T19:43:01.4965511Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:01.4965919Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:01.4966006Z bogomips : 5999.99 2025-05-07T19:43:01.4966090Z clflush size : 64 2025-05-07T19:43:01.4966178Z cache_alignment : 64 2025-05-07T19:43:01.4966325Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:01.4966412Z power management: 2025-05-07T19:43:01.4966416Z 2025-05-07T19:43:01.4966499Z processor : 65 2025-05-07T19:43:01.4966606Z vendor_id : GenuineIntel 2025-05-07T19:43:01.4966689Z cpu family : 6 2025-05-07T19:43:01.4966772Z model : 85 2025-05-07T19:43:01.4966935Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:01.4967038Z stepping : 7 2025-05-07T19:43:01.4967131Z microcode : 0x5003901 2025-05-07T19:43:01.4967214Z cpu MHz : 1203.227 2025-05-07T19:43:01.4967315Z cache size : 36608 KB 2025-05-07T19:43:01.4967402Z physical id : 0 2025-05-07T19:43:01.4967485Z siblings : 48 2025-05-07T19:43:01.4967565Z core id : 17 2025-05-07T19:43:01.4967661Z cpu cores : 24 2025-05-07T19:43:01.4967742Z apicid : 35 2025-05-07T19:43:01.4967831Z initial apicid : 35 2025-05-07T19:43:01.4967926Z fpu : yes 2025-05-07T19:43:01.4968013Z fpu_exception : yes 2025-05-07T19:43:01.4968097Z cpuid level : 13 2025-05-07T19:43:01.4968176Z wp : yes 2025-05-07T19:43:01.4970342Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:01.4970783Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:01.4970884Z bogomips : 5999.99 2025-05-07T19:43:01.4970970Z clflush size : 64 2025-05-07T19:43:01.4971059Z cache_alignment : 64 2025-05-07T19:43:01.4971192Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:01.4971296Z power management: 2025-05-07T19:43:01.4971300Z 2025-05-07T19:43:01.4971383Z processor : 66 2025-05-07T19:43:01.4971475Z vendor_id : GenuineIntel 2025-05-07T19:43:01.4971572Z cpu family : 6 2025-05-07T19:43:01.4971652Z model : 85 2025-05-07T19:43:01.4971817Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:01.4971900Z stepping : 7 2025-05-07T19:43:01.4972002Z microcode : 0x5003901 2025-05-07T19:43:01.4972090Z cpu MHz : 2999.998 2025-05-07T19:43:01.4972175Z cache size : 36608 KB 2025-05-07T19:43:01.4972273Z physical id : 0 2025-05-07T19:43:01.4972354Z siblings : 48 2025-05-07T19:43:01.4972435Z core id : 18 2025-05-07T19:43:01.4972516Z cpu cores : 24 2025-05-07T19:43:01.4972611Z apicid : 37 2025-05-07T19:43:01.4972698Z initial apicid : 37 2025-05-07T19:43:01.4972777Z fpu : yes 2025-05-07T19:43:01.4972864Z fpu_exception : yes 2025-05-07T19:43:01.4972964Z cpuid level : 13 2025-05-07T19:43:01.4973043Z wp : yes 2025-05-07T19:43:01.4975261Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:01.4975651Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:01.4975734Z bogomips : 5999.99 2025-05-07T19:43:01.4975816Z clflush size : 64 2025-05-07T19:43:01.4975913Z cache_alignment : 64 2025-05-07T19:43:01.4976041Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:01.4976125Z power management: 2025-05-07T19:43:01.4976129Z 2025-05-07T19:43:01.4976224Z processor : 67 2025-05-07T19:43:01.4976313Z vendor_id : GenuineIntel 2025-05-07T19:43:01.4976392Z cpu family : 6 2025-05-07T19:43:01.4976483Z model : 85 2025-05-07T19:43:01.4976638Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:01.4976726Z stepping : 7 2025-05-07T19:43:01.4976813Z microcode : 0x5003901 2025-05-07T19:43:01.4976909Z cpu MHz : 2999.998 2025-05-07T19:43:01.4976995Z cache size : 36608 KB 2025-05-07T19:43:01.4977081Z physical id : 0 2025-05-07T19:43:01.4977173Z siblings : 48 2025-05-07T19:43:01.4977252Z core id : 19 2025-05-07T19:43:01.4977330Z cpu cores : 24 2025-05-07T19:43:01.4977407Z apicid : 39 2025-05-07T19:43:01.4977504Z initial apicid : 39 2025-05-07T19:43:01.4977580Z fpu : yes 2025-05-07T19:43:01.4977664Z fpu_exception : yes 2025-05-07T19:43:01.4977743Z cpuid level : 13 2025-05-07T19:43:01.4977835Z wp : yes 2025-05-07T19:43:01.4980188Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:01.4980668Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:01.4980759Z bogomips : 5999.99 2025-05-07T19:43:01.4980900Z clflush size : 64 2025-05-07T19:43:01.4980991Z cache_alignment : 64 2025-05-07T19:43:01.4981147Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:01.4981240Z power management: 2025-05-07T19:43:01.4981245Z 2025-05-07T19:43:01.4981333Z processor : 68 2025-05-07T19:43:01.4981447Z vendor_id : GenuineIntel 2025-05-07T19:43:01.4981533Z cpu family : 6 2025-05-07T19:43:01.4981616Z model : 85 2025-05-07T19:43:01.4981800Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:01.4981887Z stepping : 7 2025-05-07T19:43:01.4981981Z microcode : 0x5003901 2025-05-07T19:43:01.4982068Z cpu MHz : 1502.149 2025-05-07T19:43:01.4982174Z cache size : 36608 KB 2025-05-07T19:43:01.4982262Z physical id : 0 2025-05-07T19:43:01.4982350Z siblings : 48 2025-05-07T19:43:01.4982435Z core id : 20 2025-05-07T19:43:01.4982537Z cpu cores : 24 2025-05-07T19:43:01.4982622Z apicid : 41 2025-05-07T19:43:01.4982713Z initial apicid : 41 2025-05-07T19:43:01.4982816Z fpu : yes 2025-05-07T19:43:01.4982910Z fpu_exception : yes 2025-05-07T19:43:01.4982997Z cpuid level : 13 2025-05-07T19:43:01.4983079Z wp : yes 2025-05-07T19:43:01.4985375Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:01.4985785Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:01.4985890Z bogomips : 5999.99 2025-05-07T19:43:01.4985979Z clflush size : 64 2025-05-07T19:43:01.4986072Z cache_alignment : 64 2025-05-07T19:43:01.4986210Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:01.4986320Z power management: 2025-05-07T19:43:01.4986325Z 2025-05-07T19:43:01.4986413Z processor : 69 2025-05-07T19:43:01.4986510Z vendor_id : GenuineIntel 2025-05-07T19:43:01.4986613Z cpu family : 6 2025-05-07T19:43:01.4986699Z model : 85 2025-05-07T19:43:01.4986870Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:01.4986960Z stepping : 7 2025-05-07T19:43:01.4987071Z microcode : 0x5003901 2025-05-07T19:43:01.4987163Z cpu MHz : 1206.321 2025-05-07T19:43:01.4987252Z cache size : 36608 KB 2025-05-07T19:43:01.4987358Z physical id : 0 2025-05-07T19:43:01.4987445Z siblings : 48 2025-05-07T19:43:01.4987534Z core id : 21 2025-05-07T19:43:01.4987621Z cpu cores : 24 2025-05-07T19:43:01.4987727Z apicid : 43 2025-05-07T19:43:01.4987819Z initial apicid : 43 2025-05-07T19:43:01.4987904Z fpu : yes 2025-05-07T19:43:01.4988015Z fpu_exception : yes 2025-05-07T19:43:01.4988106Z cpuid level : 13 2025-05-07T19:43:01.4988190Z wp : yes 2025-05-07T19:43:01.4990426Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:01.4990829Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:01.4991437Z bogomips : 5999.99 2025-05-07T19:43:01.4991544Z clflush size : 64 2025-05-07T19:43:01.4991637Z cache_alignment : 64 2025-05-07T19:43:01.4991782Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:01.4991870Z power management: 2025-05-07T19:43:01.4991874Z 2025-05-07T19:43:01.4991975Z processor : 70 2025-05-07T19:43:01.4992069Z vendor_id : GenuineIntel 2025-05-07T19:43:01.4992266Z cpu family : 6 2025-05-07T19:43:01.4992362Z model : 85 2025-05-07T19:43:01.4992520Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:01.4992598Z stepping : 7 2025-05-07T19:43:01.4992681Z microcode : 0x5003901 2025-05-07T19:43:01.4992773Z cpu MHz : 2999.998 2025-05-07T19:43:01.4992858Z cache size : 36608 KB 2025-05-07T19:43:01.4992938Z physical id : 0 2025-05-07T19:43:01.4993031Z siblings : 48 2025-05-07T19:43:01.4993109Z core id : 22 2025-05-07T19:43:01.4993189Z cpu cores : 24 2025-05-07T19:43:01.4993266Z apicid : 45 2025-05-07T19:43:01.4993363Z initial apicid : 45 2025-05-07T19:43:01.4993438Z fpu : yes 2025-05-07T19:43:01.4993524Z fpu_exception : yes 2025-05-07T19:43:01.4993618Z cpuid level : 13 2025-05-07T19:43:01.4993694Z wp : yes 2025-05-07T19:43:01.5011051Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:01.5011548Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:01.5011645Z bogomips : 5999.99 2025-05-07T19:43:01.5011727Z clflush size : 64 2025-05-07T19:43:01.5011813Z cache_alignment : 64 2025-05-07T19:43:01.5011962Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:01.5012049Z power management: 2025-05-07T19:43:01.5012055Z 2025-05-07T19:43:01.5012136Z processor : 71 2025-05-07T19:43:01.5012236Z vendor_id : GenuineIntel 2025-05-07T19:43:01.5012316Z cpu family : 6 2025-05-07T19:43:01.5012394Z model : 85 2025-05-07T19:43:01.5012676Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:01.5012874Z stepping : 7 2025-05-07T19:43:01.5012954Z microcode : 0x5003901 2025-05-07T19:43:01.5013028Z cpu MHz : 1273.168 2025-05-07T19:43:01.5013112Z cache size : 36608 KB 2025-05-07T19:43:01.5013195Z physical id : 0 2025-05-07T19:43:01.5013271Z siblings : 48 2025-05-07T19:43:01.5013341Z core id : 23 2025-05-07T19:43:01.5013422Z cpu cores : 24 2025-05-07T19:43:01.5013497Z apicid : 47 2025-05-07T19:43:01.5013576Z initial apicid : 47 2025-05-07T19:43:01.5013655Z fpu : yes 2025-05-07T19:43:01.5013732Z fpu_exception : yes 2025-05-07T19:43:01.5013808Z cpuid level : 13 2025-05-07T19:43:01.5013883Z wp : yes 2025-05-07T19:43:01.5015955Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:01.5016327Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:01.5016494Z bogomips : 5999.99 2025-05-07T19:43:01.5016571Z clflush size : 64 2025-05-07T19:43:01.5016651Z cache_alignment : 64 2025-05-07T19:43:01.5016774Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:01.5016866Z power management: 2025-05-07T19:43:01.5016871Z 2025-05-07T19:43:01.5016945Z processor : 72 2025-05-07T19:43:01.5017030Z vendor_id : GenuineIntel 2025-05-07T19:43:01.5017113Z cpu family : 6 2025-05-07T19:43:01.5017186Z model : 85 2025-05-07T19:43:01.5017337Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:01.5017416Z stepping : 7 2025-05-07T19:43:01.5017507Z microcode : 0x5003901 2025-05-07T19:43:01.5017582Z cpu MHz : 2999.998 2025-05-07T19:43:01.5017659Z cache size : 36608 KB 2025-05-07T19:43:01.5017743Z physical id : 1 2025-05-07T19:43:01.5017819Z siblings : 48 2025-05-07T19:43:01.5017891Z core id : 0 2025-05-07T19:43:01.5017965Z cpu cores : 24 2025-05-07T19:43:01.5018044Z apicid : 65 2025-05-07T19:43:01.5018125Z initial apicid : 65 2025-05-07T19:43:01.5018194Z fpu : yes 2025-05-07T19:43:01.5018280Z fpu_exception : yes 2025-05-07T19:43:01.5018353Z cpuid level : 13 2025-05-07T19:43:01.5018425Z wp : yes 2025-05-07T19:43:01.5020840Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:01.5021313Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:01.5021396Z bogomips : 5999.99 2025-05-07T19:43:01.5021492Z clflush size : 64 2025-05-07T19:43:01.5021576Z cache_alignment : 64 2025-05-07T19:43:01.5021710Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:01.5021795Z power management: 2025-05-07T19:43:01.5021799Z 2025-05-07T19:43:01.5021890Z processor : 73 2025-05-07T19:43:01.5021979Z vendor_id : GenuineIntel 2025-05-07T19:43:01.5022057Z cpu family : 6 2025-05-07T19:43:01.5022150Z model : 85 2025-05-07T19:43:01.5022312Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:01.5022392Z stepping : 7 2025-05-07T19:43:01.5022476Z microcode : 0x5003901 2025-05-07T19:43:01.5022568Z cpu MHz : 3107.655 2025-05-07T19:43:01.5022648Z cache size : 36608 KB 2025-05-07T19:43:01.5022731Z physical id : 1 2025-05-07T19:43:01.5022823Z siblings : 48 2025-05-07T19:43:01.5022904Z core id : 1 2025-05-07T19:43:01.5022981Z cpu cores : 24 2025-05-07T19:43:01.5023059Z apicid : 67 2025-05-07T19:43:01.5023151Z initial apicid : 67 2025-05-07T19:43:01.5023232Z fpu : yes 2025-05-07T19:43:01.5023314Z fpu_exception : yes 2025-05-07T19:43:01.5023405Z cpuid level : 13 2025-05-07T19:43:01.5023482Z wp : yes 2025-05-07T19:43:01.5025726Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:01.5026140Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:01.5026225Z bogomips : 5999.99 2025-05-07T19:43:01.5026310Z clflush size : 64 2025-05-07T19:43:01.5026456Z cache_alignment : 64 2025-05-07T19:43:01.5026592Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:01.5026677Z power management: 2025-05-07T19:43:01.5026682Z 2025-05-07T19:43:01.5026764Z processor : 74 2025-05-07T19:43:01.5026865Z vendor_id : GenuineIntel 2025-05-07T19:43:01.5026940Z cpu family : 6 2025-05-07T19:43:01.5027021Z model : 85 2025-05-07T19:43:01.5027237Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:01.5027386Z stepping : 7 2025-05-07T19:43:01.5027499Z microcode : 0x5003901 2025-05-07T19:43:01.5027576Z cpu MHz : 2999.998 2025-05-07T19:43:01.5027665Z cache size : 36608 KB 2025-05-07T19:43:01.5027743Z physical id : 1 2025-05-07T19:43:01.5027819Z siblings : 48 2025-05-07T19:43:01.5027906Z core id : 2 2025-05-07T19:43:01.5027986Z cpu cores : 24 2025-05-07T19:43:01.5028060Z apicid : 69 2025-05-07T19:43:01.5028141Z initial apicid : 69 2025-05-07T19:43:01.5028224Z fpu : yes 2025-05-07T19:43:01.5028309Z fpu_exception : yes 2025-05-07T19:43:01.5028534Z cpuid level : 13 2025-05-07T19:43:01.5028623Z wp : yes 2025-05-07T19:43:01.5030979Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:01.5031446Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:01.5031545Z bogomips : 5999.99 2025-05-07T19:43:01.5031625Z clflush size : 64 2025-05-07T19:43:01.5031709Z cache_alignment : 64 2025-05-07T19:43:01.5031858Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:01.5031941Z power management: 2025-05-07T19:43:01.5031946Z 2025-05-07T19:43:01.5032027Z processor : 75 2025-05-07T19:43:01.5032235Z vendor_id : GenuineIntel 2025-05-07T19:43:01.5032314Z cpu family : 6 2025-05-07T19:43:01.5032387Z model : 85 2025-05-07T19:43:01.5032545Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:01.5032629Z stepping : 7 2025-05-07T19:43:01.5032712Z microcode : 0x5003901 2025-05-07T19:43:01.5032788Z cpu MHz : 3150.831 2025-05-07T19:43:01.5032875Z cache size : 36608 KB 2025-05-07T19:43:01.5032955Z physical id : 1 2025-05-07T19:43:01.5033032Z siblings : 48 2025-05-07T19:43:01.5033106Z core id : 3 2025-05-07T19:43:01.5033190Z cpu cores : 24 2025-05-07T19:43:01.5033265Z apicid : 71 2025-05-07T19:43:01.5033349Z initial apicid : 71 2025-05-07T19:43:01.5033423Z fpu : yes 2025-05-07T19:43:01.5033509Z fpu_exception : yes 2025-05-07T19:43:01.5033590Z cpuid level : 13 2025-05-07T19:43:01.5033666Z wp : yes 2025-05-07T19:43:01.5035830Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:01.5036331Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:01.5036414Z bogomips : 5999.99 2025-05-07T19:43:01.5036498Z clflush size : 64 2025-05-07T19:43:01.5036577Z cache_alignment : 64 2025-05-07T19:43:01.5036699Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:01.5036836Z power management: 2025-05-07T19:43:01.5036840Z 2025-05-07T19:43:01.5036914Z processor : 76 2025-05-07T19:43:01.5036998Z vendor_id : GenuineIntel 2025-05-07T19:43:01.5037078Z cpu family : 6 2025-05-07T19:43:01.5037149Z model : 85 2025-05-07T19:43:01.5037301Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:01.5037372Z stepping : 7 2025-05-07T19:43:01.5037455Z microcode : 0x5003901 2025-05-07T19:43:01.5037530Z cpu MHz : 3160.991 2025-05-07T19:43:01.5037605Z cache size : 36608 KB 2025-05-07T19:43:01.5037686Z physical id : 1 2025-05-07T19:43:01.5037758Z siblings : 48 2025-05-07T19:43:01.5037828Z core id : 4 2025-05-07T19:43:01.5037902Z cpu cores : 24 2025-05-07T19:43:01.5037979Z apicid : 73 2025-05-07T19:43:01.5038056Z initial apicid : 73 2025-05-07T19:43:01.5038132Z fpu : yes 2025-05-07T19:43:01.5038210Z fpu_exception : yes 2025-05-07T19:43:01.5038291Z cpuid level : 13 2025-05-07T19:43:01.5038360Z wp : yes 2025-05-07T19:43:01.5040417Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:01.5040791Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:01.5040867Z bogomips : 5999.99 2025-05-07T19:43:01.5040995Z clflush size : 64 2025-05-07T19:43:01.5041087Z cache_alignment : 64 2025-05-07T19:43:01.5041209Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:01.5041292Z power management: 2025-05-07T19:43:01.5041296Z 2025-05-07T19:43:01.5041379Z processor : 77 2025-05-07T19:43:01.5041462Z vendor_id : GenuineIntel 2025-05-07T19:43:01.5041533Z cpu family : 6 2025-05-07T19:43:01.5041604Z model : 85 2025-05-07T19:43:01.5041763Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:01.5041836Z stepping : 7 2025-05-07T19:43:01.5041917Z microcode : 0x5003901 2025-05-07T19:43:01.5042003Z cpu MHz : 3141.379 2025-05-07T19:43:01.5042079Z cache size : 36608 KB 2025-05-07T19:43:01.5042154Z physical id : 1 2025-05-07T19:43:01.5042227Z siblings : 48 2025-05-07T19:43:01.5042310Z core id : 5 2025-05-07T19:43:01.5042384Z cpu cores : 24 2025-05-07T19:43:01.5042455Z apicid : 75 2025-05-07T19:43:01.5042545Z initial apicid : 75 2025-05-07T19:43:01.5042618Z fpu : yes 2025-05-07T19:43:01.5042698Z fpu_exception : yes 2025-05-07T19:43:01.5042771Z cpuid level : 13 2025-05-07T19:43:01.5042854Z wp : yes 2025-05-07T19:43:01.5044904Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:01.5045280Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:01.5045356Z bogomips : 5999.99 2025-05-07T19:43:01.5045433Z clflush size : 64 2025-05-07T19:43:01.5045514Z cache_alignment : 64 2025-05-07T19:43:01.5045649Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:01.5045725Z power management: 2025-05-07T19:43:01.5045777Z 2025-05-07T19:43:01.5045850Z processor : 78 2025-05-07T19:43:01.5045943Z vendor_id : GenuineIntel 2025-05-07T19:43:01.5046015Z cpu family : 6 2025-05-07T19:43:01.5046083Z model : 85 2025-05-07T19:43:01.5046229Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:01.5046309Z stepping : 7 2025-05-07T19:43:01.5046384Z microcode : 0x5003901 2025-05-07T19:43:01.5046456Z cpu MHz : 3152.294 2025-05-07T19:43:01.5046537Z cache size : 36608 KB 2025-05-07T19:43:01.5046609Z physical id : 1 2025-05-07T19:43:01.5046679Z siblings : 48 2025-05-07T19:43:01.5046746Z core id : 6 2025-05-07T19:43:01.5046828Z cpu cores : 24 2025-05-07T19:43:01.5046899Z apicid : 77 2025-05-07T19:43:01.5046973Z initial apicid : 77 2025-05-07T19:43:01.5047058Z fpu : yes 2025-05-07T19:43:01.5047138Z fpu_exception : yes 2025-05-07T19:43:01.5047215Z cpuid level : 13 2025-05-07T19:43:01.5047285Z wp : yes 2025-05-07T19:43:01.5049338Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:01.5049705Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:01.5049790Z bogomips : 5999.99 2025-05-07T19:43:01.5049868Z clflush size : 64 2025-05-07T19:43:01.5049947Z cache_alignment : 64 2025-05-07T19:43:01.5050117Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:01.5050201Z power management: 2025-05-07T19:43:01.5050205Z 2025-05-07T19:43:01.5050281Z processor : 79 2025-05-07T19:43:01.5050368Z vendor_id : GenuineIntel 2025-05-07T19:43:01.5050449Z cpu family : 6 2025-05-07T19:43:01.5050519Z model : 85 2025-05-07T19:43:01.5050668Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:01.5050742Z stepping : 7 2025-05-07T19:43:01.5050830Z microcode : 0x5003901 2025-05-07T19:43:01.5050906Z cpu MHz : 3063.344 2025-05-07T19:43:01.5050984Z cache size : 36608 KB 2025-05-07T19:43:01.5051071Z physical id : 1 2025-05-07T19:43:01.5051145Z siblings : 48 2025-05-07T19:43:01.5051217Z core id : 7 2025-05-07T19:43:01.5051289Z cpu cores : 24 2025-05-07T19:43:01.5051369Z apicid : 79 2025-05-07T19:43:01.5051448Z initial apicid : 79 2025-05-07T19:43:01.5051517Z fpu : yes 2025-05-07T19:43:01.5051603Z fpu_exception : yes 2025-05-07T19:43:01.5051679Z cpuid level : 13 2025-05-07T19:43:01.5051752Z wp : yes 2025-05-07T19:43:01.5053817Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:01.5054184Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:01.5054260Z bogomips : 5999.99 2025-05-07T19:43:01.5054343Z clflush size : 64 2025-05-07T19:43:01.5054423Z cache_alignment : 64 2025-05-07T19:43:01.5054541Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:01.5054621Z power management: 2025-05-07T19:43:01.5054625Z 2025-05-07T19:43:01.5054707Z processor : 80 2025-05-07T19:43:01.5054791Z vendor_id : GenuineIntel 2025-05-07T19:43:01.5054925Z cpu family : 6 2025-05-07T19:43:01.5055004Z model : 85 2025-05-07T19:43:01.5055165Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:01.5055237Z stepping : 7 2025-05-07T19:43:01.5055313Z microcode : 0x5003901 2025-05-07T19:43:01.5055392Z cpu MHz : 3056.981 2025-05-07T19:43:01.5055468Z cache size : 36608 KB 2025-05-07T19:43:01.5055539Z physical id : 1 2025-05-07T19:43:01.5055632Z siblings : 48 2025-05-07T19:43:01.5055703Z core id : 8 2025-05-07T19:43:01.5055776Z cpu cores : 24 2025-05-07T19:43:01.5055848Z apicid : 81 2025-05-07T19:43:01.5055934Z initial apicid : 81 2025-05-07T19:43:01.5056005Z fpu : yes 2025-05-07T19:43:01.5056081Z fpu_exception : yes 2025-05-07T19:43:01.5056153Z cpuid level : 13 2025-05-07T19:43:01.5056232Z wp : yes 2025-05-07T19:43:01.5058281Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:01.5058659Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:01.5058737Z bogomips : 5999.99 2025-05-07T19:43:01.5058811Z clflush size : 64 2025-05-07T19:43:01.5058908Z cache_alignment : 64 2025-05-07T19:43:01.5059029Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:01.5059136Z power management: 2025-05-07T19:43:01.5059206Z 2025-05-07T19:43:01.5059307Z processor : 81 2025-05-07T19:43:01.5059415Z vendor_id : GenuineIntel 2025-05-07T19:43:01.5059673Z cpu family : 6 2025-05-07T19:43:01.5059755Z model : 85 2025-05-07T19:43:01.5060131Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:01.5060248Z stepping : 7 2025-05-07T19:43:01.5060361Z microcode : 0x5003901 2025-05-07T19:43:01.5060462Z cpu MHz : 3206.943 2025-05-07T19:43:01.5060571Z cache size : 36608 KB 2025-05-07T19:43:01.5060731Z physical id : 1 2025-05-07T19:43:01.5060921Z siblings : 48 2025-05-07T19:43:01.5061013Z core id : 9 2025-05-07T19:43:01.5061109Z cpu cores : 24 2025-05-07T19:43:01.5061182Z apicid : 83 2025-05-07T19:43:01.5061267Z initial apicid : 83 2025-05-07T19:43:01.5061342Z fpu : yes 2025-05-07T19:43:01.5061436Z fpu_exception : yes 2025-05-07T19:43:01.5061516Z cpuid level : 13 2025-05-07T19:43:01.5061593Z wp : yes 2025-05-07T19:43:01.5063992Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:01.5064425Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:01.5064508Z bogomips : 5999.99 2025-05-07T19:43:01.5064596Z clflush size : 64 2025-05-07T19:43:01.5064683Z cache_alignment : 64 2025-05-07T19:43:01.5064811Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:01.5064908Z power management: 2025-05-07T19:43:01.5064913Z 2025-05-07T19:43:01.5065007Z processor : 82 2025-05-07T19:43:01.5065096Z vendor_id : GenuineIntel 2025-05-07T19:43:01.5065176Z cpu family : 6 2025-05-07T19:43:01.5065256Z model : 85 2025-05-07T19:43:01.5065484Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:01.5065577Z stepping : 7 2025-05-07T19:43:01.5065668Z microcode : 0x5003901 2025-05-07T19:43:01.5065749Z cpu MHz : 3838.851 2025-05-07T19:43:01.5065833Z cache size : 36608 KB 2025-05-07T19:43:01.5065912Z physical id : 1 2025-05-07T19:43:01.5065997Z siblings : 48 2025-05-07T19:43:01.5066072Z core id : 10 2025-05-07T19:43:01.5066152Z cpu cores : 24 2025-05-07T19:43:01.5066228Z apicid : 85 2025-05-07T19:43:01.5066316Z initial apicid : 85 2025-05-07T19:43:01.5066390Z fpu : yes 2025-05-07T19:43:01.5066474Z fpu_exception : yes 2025-05-07T19:43:01.5066559Z cpuid level : 13 2025-05-07T19:43:01.5066633Z wp : yes 2025-05-07T19:43:01.5068861Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:01.5069268Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:01.5069350Z bogomips : 5999.99 2025-05-07T19:43:01.5069431Z clflush size : 64 2025-05-07T19:43:01.5069524Z cache_alignment : 64 2025-05-07T19:43:01.5069657Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:01.5069740Z power management: 2025-05-07T19:43:01.5069744Z 2025-05-07T19:43:01.5069835Z processor : 83 2025-05-07T19:43:01.5069976Z vendor_id : GenuineIntel 2025-05-07T19:43:01.5070055Z cpu family : 6 2025-05-07T19:43:01.5070129Z model : 85 2025-05-07T19:43:01.5070300Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:01.5070386Z stepping : 7 2025-05-07T19:43:01.5070468Z microcode : 0x5003901 2025-05-07T19:43:01.5070547Z cpu MHz : 3213.681 2025-05-07T19:43:01.5070635Z cache size : 36608 KB 2025-05-07T19:43:01.5070716Z physical id : 1 2025-05-07T19:43:01.5070792Z siblings : 48 2025-05-07T19:43:01.5070877Z core id : 11 2025-05-07T19:43:01.5070955Z cpu cores : 24 2025-05-07T19:43:01.5071033Z apicid : 87 2025-05-07T19:43:01.5071118Z initial apicid : 87 2025-05-07T19:43:01.5071204Z fpu : yes 2025-05-07T19:43:01.5071288Z fpu_exception : yes 2025-05-07T19:43:01.5071370Z cpuid level : 13 2025-05-07T19:43:01.5071455Z wp : yes 2025-05-07T19:43:01.5073659Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:01.5074028Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:01.5074115Z bogomips : 5999.99 2025-05-07T19:43:01.5074188Z clflush size : 64 2025-05-07T19:43:01.5074269Z cache_alignment : 64 2025-05-07T19:43:01.5074398Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:01.5074475Z power management: 2025-05-07T19:43:01.5074480Z 2025-05-07T19:43:01.5074554Z processor : 84 2025-05-07T19:43:01.5074639Z vendor_id : GenuineIntel 2025-05-07T19:43:01.5074721Z cpu family : 6 2025-05-07T19:43:01.5074790Z model : 85 2025-05-07T19:43:01.5074940Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:01.5075022Z stepping : 7 2025-05-07T19:43:01.5075150Z microcode : 0x5003901 2025-05-07T19:43:01.5075221Z cpu MHz : 3142.216 2025-05-07T19:43:01.5075298Z cache size : 36608 KB 2025-05-07T19:43:01.5075382Z physical id : 1 2025-05-07T19:43:01.5075454Z siblings : 48 2025-05-07T19:43:01.5075523Z core id : 12 2025-05-07T19:43:01.5075600Z cpu cores : 24 2025-05-07T19:43:01.5075671Z apicid : 89 2025-05-07T19:43:01.5075748Z initial apicid : 89 2025-05-07T19:43:01.5075818Z fpu : yes 2025-05-07T19:43:01.5075908Z fpu_exception : yes 2025-05-07T19:43:01.5075982Z cpuid level : 13 2025-05-07T19:43:01.5076051Z wp : yes 2025-05-07T19:43:01.5078087Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:01.5078455Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:01.5078532Z bogomips : 5999.99 2025-05-07T19:43:01.5078613Z clflush size : 64 2025-05-07T19:43:01.5078692Z cache_alignment : 64 2025-05-07T19:43:01.5078812Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:01.5078896Z power management: 2025-05-07T19:43:01.5078900Z 2025-05-07T19:43:01.5078973Z processor : 85 2025-05-07T19:43:01.5079055Z vendor_id : GenuineIntel 2025-05-07T19:43:01.5079127Z cpu family : 6 2025-05-07T19:43:01.5079207Z model : 85 2025-05-07T19:43:01.5079403Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:01.5079477Z stepping : 7 2025-05-07T19:43:01.5079560Z microcode : 0x5003901 2025-05-07T19:43:01.5079635Z cpu MHz : 3137.836 2025-05-07T19:43:01.5079712Z cache size : 36608 KB 2025-05-07T19:43:01.5079786Z physical id : 1 2025-05-07T19:43:01.5079866Z siblings : 48 2025-05-07T19:43:01.5079936Z core id : 13 2025-05-07T19:43:01.5080009Z cpu cores : 24 2025-05-07T19:43:01.5080086Z apicid : 91 2025-05-07T19:43:01.5080163Z initial apicid : 91 2025-05-07T19:43:01.5080233Z fpu : yes 2025-05-07T19:43:01.5080313Z fpu_exception : yes 2025-05-07T19:43:01.5080398Z cpuid level : 13 2025-05-07T19:43:01.5080468Z wp : yes 2025-05-07T19:43:01.5082521Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:01.5082896Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:01.5082972Z bogomips : 5999.99 2025-05-07T19:43:01.5083046Z clflush size : 64 2025-05-07T19:43:01.5083133Z cache_alignment : 64 2025-05-07T19:43:01.5083253Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:01.5083330Z power management: 2025-05-07T19:43:01.5083334Z 2025-05-07T19:43:01.5083414Z processor : 86 2025-05-07T19:43:01.5083498Z vendor_id : GenuineIntel 2025-05-07T19:43:01.5083571Z cpu family : 6 2025-05-07T19:43:01.5083639Z model : 85 2025-05-07T19:43:01.5083802Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:01.5083876Z stepping : 7 2025-05-07T19:43:01.5083953Z microcode : 0x5003901 2025-05-07T19:43:01.5084032Z cpu MHz : 3127.412 2025-05-07T19:43:01.5084194Z cache size : 36608 KB 2025-05-07T19:43:01.5084273Z physical id : 1 2025-05-07T19:43:01.5084348Z siblings : 48 2025-05-07T19:43:01.5084430Z core id : 14 2025-05-07T19:43:01.5084505Z cpu cores : 24 2025-05-07T19:43:01.5084579Z apicid : 93 2025-05-07T19:43:01.5084667Z initial apicid : 93 2025-05-07T19:43:01.5084739Z fpu : yes 2025-05-07T19:43:01.5084819Z fpu_exception : yes 2025-05-07T19:43:01.5084897Z cpuid level : 13 2025-05-07T19:43:01.5084977Z wp : yes 2025-05-07T19:43:01.5087028Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:01.5087399Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:01.5087477Z bogomips : 5999.99 2025-05-07T19:43:01.5087549Z clflush size : 64 2025-05-07T19:43:01.5087628Z cache_alignment : 64 2025-05-07T19:43:01.5087756Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:01.5087831Z power management: 2025-05-07T19:43:01.5087835Z 2025-05-07T19:43:01.5087909Z processor : 87 2025-05-07T19:43:01.5087996Z vendor_id : GenuineIntel 2025-05-07T19:43:01.5088072Z cpu family : 6 2025-05-07T19:43:01.5088142Z model : 85 2025-05-07T19:43:01.5088292Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:01.5088423Z stepping : 7 2025-05-07T19:43:01.5088501Z microcode : 0x5003901 2025-05-07T19:43:01.5088572Z cpu MHz : 3093.237 2025-05-07T19:43:01.5088654Z cache size : 36608 KB 2025-05-07T19:43:01.5088732Z physical id : 1 2025-05-07T19:43:01.5088802Z siblings : 48 2025-05-07T19:43:01.5088875Z core id : 15 2025-05-07T19:43:01.5088954Z cpu cores : 24 2025-05-07T19:43:01.5089026Z apicid : 95 2025-05-07T19:43:01.5089101Z initial apicid : 95 2025-05-07T19:43:01.5089169Z fpu : yes 2025-05-07T19:43:01.5089254Z fpu_exception : yes 2025-05-07T19:43:01.5089333Z cpuid level : 13 2025-05-07T19:43:01.5089403Z wp : yes 2025-05-07T19:43:01.5091463Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:01.5091834Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:01.5091919Z bogomips : 5999.99 2025-05-07T19:43:01.5091999Z clflush size : 64 2025-05-07T19:43:01.5092077Z cache_alignment : 64 2025-05-07T19:43:01.5092199Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:01.5092288Z power management: 2025-05-07T19:43:01.5092292Z 2025-05-07T19:43:01.5092365Z processor : 88 2025-05-07T19:43:01.5092445Z vendor_id : GenuineIntel 2025-05-07T19:43:01.5092527Z cpu family : 6 2025-05-07T19:43:01.5092597Z model : 85 2025-05-07T19:43:01.5092747Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:01.5092820Z stepping : 7 2025-05-07T19:43:01.5092911Z microcode : 0x5003901 2025-05-07T19:43:01.5092983Z cpu MHz : 2999.998 2025-05-07T19:43:01.5093057Z cache size : 36608 KB 2025-05-07T19:43:01.5093137Z physical id : 1 2025-05-07T19:43:01.5093272Z siblings : 48 2025-05-07T19:43:01.5093343Z core id : 16 2025-05-07T19:43:01.5093417Z cpu cores : 24 2025-05-07T19:43:01.5093494Z apicid : 97 2025-05-07T19:43:01.5093570Z initial apicid : 97 2025-05-07T19:43:01.5093641Z fpu : yes 2025-05-07T19:43:01.5093718Z fpu_exception : yes 2025-05-07T19:43:01.5093797Z cpuid level : 13 2025-05-07T19:43:01.5093865Z wp : yes 2025-05-07T19:43:01.5095915Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:01.5096289Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:01.5096365Z bogomips : 5999.99 2025-05-07T19:43:01.5096441Z clflush size : 64 2025-05-07T19:43:01.5096526Z cache_alignment : 64 2025-05-07T19:43:01.5096645Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:01.5096722Z power management: 2025-05-07T19:43:01.5096725Z 2025-05-07T19:43:01.5096810Z processor : 89 2025-05-07T19:43:01.5096890Z vendor_id : GenuineIntel 2025-05-07T19:43:01.5096963Z cpu family : 6 2025-05-07T19:43:01.5097043Z model : 85 2025-05-07T19:43:01.5097193Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:01.5097266Z stepping : 7 2025-05-07T19:43:01.5097345Z microcode : 0x5003901 2025-05-07T19:43:01.5097477Z cpu MHz : 2999.998 2025-05-07T19:43:01.5097553Z cache size : 36608 KB 2025-05-07T19:43:01.5097626Z physical id : 1 2025-05-07T19:43:01.5097698Z siblings : 48 2025-05-07T19:43:01.5097778Z core id : 17 2025-05-07T19:43:01.5097849Z cpu cores : 24 2025-05-07T19:43:01.5097918Z apicid : 99 2025-05-07T19:43:01.5097998Z initial apicid : 99 2025-05-07T19:43:01.5098065Z fpu : yes 2025-05-07T19:43:01.5098140Z fpu_exception : yes 2025-05-07T19:43:01.5098212Z cpuid level : 13 2025-05-07T19:43:01.5098286Z wp : yes 2025-05-07T19:43:01.5100989Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:01.5101400Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:01.5101486Z bogomips : 5999.99 2025-05-07T19:43:01.5101572Z clflush size : 64 2025-05-07T19:43:01.5101657Z cache_alignment : 64 2025-05-07T19:43:01.5101796Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:01.5101877Z power management: 2025-05-07T19:43:01.5101882Z 2025-05-07T19:43:01.5101962Z processor : 90 2025-05-07T19:43:01.5102060Z vendor_id : GenuineIntel 2025-05-07T19:43:01.5102136Z cpu family : 6 2025-05-07T19:43:01.5102212Z model : 85 2025-05-07T19:43:01.5102370Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:01.5102456Z stepping : 7 2025-05-07T19:43:01.5102541Z microcode : 0x5003901 2025-05-07T19:43:01.5102617Z cpu MHz : 3126.047 2025-05-07T19:43:01.5102708Z cache size : 36608 KB 2025-05-07T19:43:01.5102788Z physical id : 1 2025-05-07T19:43:01.5102865Z siblings : 48 2025-05-07T19:43:01.5102944Z core id : 18 2025-05-07T19:43:01.5103030Z cpu cores : 24 2025-05-07T19:43:01.5103206Z apicid : 101 2025-05-07T19:43:01.5103292Z initial apicid : 101 2025-05-07T19:43:01.5103372Z fpu : yes 2025-05-07T19:43:01.5103458Z fpu_exception : yes 2025-05-07T19:43:01.5103537Z cpuid level : 13 2025-05-07T19:43:01.5103611Z wp : yes 2025-05-07T19:43:01.5105844Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:01.5106236Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:01.5106330Z bogomips : 5999.99 2025-05-07T19:43:01.5106410Z clflush size : 64 2025-05-07T19:43:01.5106496Z cache_alignment : 64 2025-05-07T19:43:01.5106626Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:01.5106718Z power management: 2025-05-07T19:43:01.5106723Z 2025-05-07T19:43:01.5106801Z processor : 91 2025-05-07T19:43:01.5106887Z vendor_id : GenuineIntel 2025-05-07T19:43:01.5106975Z cpu family : 6 2025-05-07T19:43:01.5107048Z model : 85 2025-05-07T19:43:01.5107207Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:01.5107290Z stepping : 7 2025-05-07T19:43:01.5107376Z microcode : 0x5003901 2025-05-07T19:43:01.5107452Z cpu MHz : 3120.904 2025-05-07T19:43:01.5107532Z cache size : 36608 KB 2025-05-07T19:43:01.5107685Z physical id : 1 2025-05-07T19:43:01.5107764Z siblings : 48 2025-05-07T19:43:01.5107841Z core id : 19 2025-05-07T19:43:01.5107922Z cpu cores : 24 2025-05-07T19:43:01.5108003Z apicid : 103 2025-05-07T19:43:01.5108089Z initial apicid : 103 2025-05-07T19:43:01.5108162Z fpu : yes 2025-05-07T19:43:01.5108250Z fpu_exception : yes 2025-05-07T19:43:01.5108327Z cpuid level : 13 2025-05-07T19:43:01.5108397Z wp : yes 2025-05-07T19:43:01.5110624Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:01.5111016Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:01.5111098Z bogomips : 5999.99 2025-05-07T19:43:01.5111187Z clflush size : 64 2025-05-07T19:43:01.5111268Z cache_alignment : 64 2025-05-07T19:43:01.5111395Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:01.5111476Z power management: 2025-05-07T19:43:01.5111480Z 2025-05-07T19:43:01.5111562Z processor : 92 2025-05-07T19:43:01.5111646Z vendor_id : GenuineIntel 2025-05-07T19:43:01.5111720Z cpu family : 6 2025-05-07T19:43:01.5111801Z model : 85 2025-05-07T19:43:01.5111959Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:01.5112034Z stepping : 7 2025-05-07T19:43:01.5112115Z microcode : 0x5003901 2025-05-07T19:43:01.5112195Z cpu MHz : 3145.800 2025-05-07T19:43:01.5112272Z cache size : 36608 KB 2025-05-07T19:43:01.5112349Z physical id : 1 2025-05-07T19:43:01.5112429Z siblings : 48 2025-05-07T19:43:01.5112504Z core id : 20 2025-05-07T19:43:01.5112587Z cpu cores : 24 2025-05-07T19:43:01.5112670Z apicid : 105 2025-05-07T19:43:01.5112884Z initial apicid : 105 2025-05-07T19:43:01.5113014Z fpu : yes 2025-05-07T19:43:01.5113102Z fpu_exception : yes 2025-05-07T19:43:01.5113200Z cpuid level : 13 2025-05-07T19:43:01.5113279Z wp : yes 2025-05-07T19:43:01.5115430Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:01.5115843Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:01.5115929Z bogomips : 5999.99 2025-05-07T19:43:01.5116018Z clflush size : 64 2025-05-07T19:43:01.5116120Z cache_alignment : 64 2025-05-07T19:43:01.5116253Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:01.5116340Z power management: 2025-05-07T19:43:01.5116344Z 2025-05-07T19:43:01.5116427Z processor : 93 2025-05-07T19:43:01.5116532Z vendor_id : GenuineIntel 2025-05-07T19:43:01.5116616Z cpu family : 6 2025-05-07T19:43:01.5116696Z model : 85 2025-05-07T19:43:01.5116875Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:01.5116958Z stepping : 7 2025-05-07T19:43:01.5117046Z microcode : 0x5003901 2025-05-07T19:43:01.5117129Z cpu MHz : 3156.742 2025-05-07T19:43:01.5117232Z cache size : 36608 KB 2025-05-07T19:43:01.5117319Z physical id : 1 2025-05-07T19:43:01.5117398Z siblings : 48 2025-05-07T19:43:01.5117492Z core id : 21 2025-05-07T19:43:01.5117629Z cpu cores : 24 2025-05-07T19:43:01.5117711Z apicid : 107 2025-05-07T19:43:01.5117798Z initial apicid : 107 2025-05-07T19:43:01.5117892Z fpu : yes 2025-05-07T19:43:01.5117982Z fpu_exception : yes 2025-05-07T19:43:01.5118065Z cpuid level : 13 2025-05-07T19:43:01.5118144Z wp : yes 2025-05-07T19:43:01.5120319Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:01.5120712Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:01.5120810Z bogomips : 5999.99 2025-05-07T19:43:01.5120894Z clflush size : 64 2025-05-07T19:43:01.5120985Z cache_alignment : 64 2025-05-07T19:43:01.5121135Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:01.5121221Z power management: 2025-05-07T19:43:01.5121225Z 2025-05-07T19:43:01.5121309Z processor : 94 2025-05-07T19:43:01.5121399Z vendor_id : GenuineIntel 2025-05-07T19:43:01.5121492Z cpu family : 6 2025-05-07T19:43:01.5121570Z model : 85 2025-05-07T19:43:01.5121727Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:01.5121814Z stepping : 7 2025-05-07T19:43:01.5121900Z microcode : 0x5003901 2025-05-07T19:43:01.5121977Z cpu MHz : 3162.016 2025-05-07T19:43:01.5122068Z cache size : 36608 KB 2025-05-07T19:43:01.5122157Z physical id : 1 2025-05-07T19:43:01.5122234Z siblings : 48 2025-05-07T19:43:01.5122312Z core id : 22 2025-05-07T19:43:01.5122393Z cpu cores : 24 2025-05-07T19:43:01.5122470Z apicid : 109 2025-05-07T19:43:01.5122553Z initial apicid : 109 2025-05-07T19:43:01.5122627Z fpu : yes 2025-05-07T19:43:01.5122711Z fpu_exception : yes 2025-05-07T19:43:01.5122842Z cpuid level : 13 2025-05-07T19:43:01.5122914Z wp : yes 2025-05-07T19:43:01.5125085Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:01.5125481Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:01.5125559Z bogomips : 5999.99 2025-05-07T19:43:01.5125646Z clflush size : 64 2025-05-07T19:43:01.5125733Z cache_alignment : 64 2025-05-07T19:43:01.5125866Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:01.5125958Z power management: 2025-05-07T19:43:01.5125962Z 2025-05-07T19:43:01.5126040Z processor : 95 2025-05-07T19:43:01.5126130Z vendor_id : GenuineIntel 2025-05-07T19:43:01.5126208Z cpu family : 6 2025-05-07T19:43:01.5126293Z model : 85 2025-05-07T19:43:01.5126451Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:01.5126532Z stepping : 7 2025-05-07T19:43:01.5126624Z microcode : 0x5003901 2025-05-07T19:43:01.5126700Z cpu MHz : 3138.882 2025-05-07T19:43:01.5126783Z cache size : 36608 KB 2025-05-07T19:43:01.5126864Z physical id : 1 2025-05-07T19:43:01.5126957Z siblings : 48 2025-05-07T19:43:01.5127032Z core id : 23 2025-05-07T19:43:01.5127109Z cpu cores : 24 2025-05-07T19:43:01.5127187Z apicid : 111 2025-05-07T19:43:01.5127814Z initial apicid : 111 2025-05-07T19:43:01.5127899Z fpu : yes 2025-05-07T19:43:01.5127982Z fpu_exception : yes 2025-05-07T19:43:01.5128074Z cpuid level : 13 2025-05-07T19:43:01.5128154Z wp : yes 2025-05-07T19:43:01.5130340Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:01.5130739Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:01.5130827Z bogomips : 5999.99 2025-05-07T19:43:01.5130911Z clflush size : 64 2025-05-07T19:43:01.5131003Z cache_alignment : 64 2025-05-07T19:43:01.5131132Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:01.5131225Z power management: 2025-05-07T19:43:01.5131230Z 2025-05-07T19:43:01.5131233Z 2025-05-07T19:43:01.5131355Z ################################################################################ 2025-05-07T19:43:01.5131455Z [INFO] Print PCI info ... 2025-05-07T19:43:01.5131531Z + lspci -v 2025-05-07T19:43:01.5131536Z 2025-05-07T19:43:01.5131718Z 00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] 2025-05-07T19:43:01.5131837Z Subsystem: Amazon.com, Inc. Device 1237 2025-05-07T19:43:01.5131954Z Flags: bus master, medium devsel, latency 0 2025-05-07T19:43:01.5131959Z 2025-05-07T19:43:01.5132159Z 00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II] 2025-05-07T19:43:01.5132250Z Physical Slot: 1 2025-05-07T19:43:01.5132365Z Flags: bus master, fast devsel, latency 0 2025-05-07T19:43:01.5132370Z 2025-05-07T19:43:01.5132630Z 00:01.3 Non-VGA unclassified device: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 08) 2025-05-07T19:43:01.5132781Z Physical Slot: 1 2025-05-07T19:43:01.5132908Z Flags: bus master, fast devsel, latency 0, IRQ 9 2025-05-07T19:43:01.5132913Z 2025-05-07T19:43:01.5133290Z 00:03.0 VGA compatible controller: Amazon.com, Inc. Device 1111 (prog-if 00 [VGA controller]) 2025-05-07T19:43:01.5133375Z Physical Slot: 3 2025-05-07T19:43:01.5133482Z Flags: bus master, fast devsel, latency 0 2025-05-07T19:43:01.5133609Z Memory at c0000000 (32-bit, prefetchable) [size=4M] 2025-05-07T19:43:01.5133727Z Expansion ROM at 000c0000 [disabled] [size=128K] 2025-05-07T19:43:01.5133741Z 2025-05-07T19:43:01.5134044Z 00:04.0 Non-Volatile memory controller: Amazon.com, Inc. NVMe EBS Controller (prog-if 02 [NVM Express]) 2025-05-07T19:43:01.5134147Z Subsystem: Amazon.com, Inc. Device 0000 2025-05-07T19:43:01.5134226Z Physical Slot: 4 2025-05-07T19:43:01.5134367Z Flags: bus master, fast devsel, latency 0, IRQ 11 2025-05-07T19:43:01.5134514Z Memory at c0514000 (32-bit, non-prefetchable) [size=16K] 2025-05-07T19:43:01.5134609Z Capabilities: 2025-05-07T19:43:01.5134707Z Kernel driver in use: nvme 2025-05-07T19:43:01.5134712Z 2025-05-07T19:43:01.5134918Z 00:05.0 Ethernet controller: Amazon.com, Inc. Elastic Network Adapter (ENA) 2025-05-07T19:43:01.5134993Z Physical Slot: 5 2025-05-07T19:43:01.5135096Z Flags: bus master, fast devsel, latency 0 2025-05-07T19:43:01.5135253Z Memory at c0510000 (32-bit, non-prefetchable) [size=16K] 2025-05-07T19:43:01.5135379Z Memory at c0400000 (32-bit, prefetchable) [size=1M] 2025-05-07T19:43:01.5135522Z Memory at c0500000 (32-bit, non-prefetchable) [size=64K] 2025-05-07T19:43:01.5135627Z Capabilities: 2025-05-07T19:43:01.5135711Z Kernel driver in use: ena 2025-05-07T19:43:01.5135715Z 2025-05-07T19:43:01.5135719Z 2025-05-07T19:43:01.5135871Z ################################################################################ 2025-05-07T19:43:01.5135987Z [INFO] Print Linux distribution info ... 2025-05-07T19:43:01.5136062Z + uname -a 2025-05-07T19:43:01.5136067Z 2025-05-07T19:43:01.5136437Z Linux 565b81b7c816 6.1.130-139.222.amzn2023.x86_64 #1 SMP PREEMPT_DYNAMIC Tue Mar 11 01:10:58 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux 2025-05-07T19:43:01.5136442Z 2025-05-07T19:43:01.5136514Z + uname -m 2025-05-07T19:43:01.5136518Z 2025-05-07T19:43:01.5136583Z x86_64 2025-05-07T19:43:01.5136587Z 2025-05-07T19:43:01.5136689Z + cat /proc/version 2025-05-07T19:43:01.5136693Z 2025-05-07T19:43:01.5137261Z Linux version 6.1.130-139.222.amzn2023.x86_64 (mockbuild@ip-10-0-55-76) (gcc (GCC) 11.5.0 20240719 (Red Hat 11.5.0-5), GNU ld version 2.39-6.amzn2023.0.11) #1 SMP PREEMPT_DYNAMIC Tue Mar 11 01:10:58 UTC 2025 2025-05-07T19:43:01.5137266Z 2025-05-07T19:43:01.5137342Z + cat /etc/os-release 2025-05-07T19:43:01.5137346Z 2025-05-07T19:43:01.5137422Z NAME="Amazon Linux" 2025-05-07T19:43:01.5137499Z VERSION="2023" 2025-05-07T19:43:01.5137570Z ID="amzn" 2025-05-07T19:43:01.5137642Z ID_LIKE="fedora" 2025-05-07T19:43:01.5137721Z VERSION_ID="2023" 2025-05-07T19:43:01.5137813Z PLATFORM_ID="platform:al2023" 2025-05-07T19:43:01.5137921Z PRETTY_NAME="Amazon Linux 2023.7.20250428" 2025-05-07T19:43:01.5137994Z ANSI_COLOR="0;33" 2025-05-07T19:43:01.5138110Z CPE_NAME="cpe:2.3:o:amazon:amazon_linux:2023" 2025-05-07T19:43:01.5138286Z HOME_URL="https://aws.amazon.com/linux/amazon-linux-2023/" 2025-05-07T19:43:01.5138445Z DOCUMENTATION_URL="https://docs.aws.amazon.com/linux/" 2025-05-07T19:43:01.5138597Z SUPPORT_URL="https://aws.amazon.com/premiumsupport/" 2025-05-07T19:43:01.5138778Z BUG_REPORT_URL="https://github.com/amazonlinux/amazon-linux-2023" 2025-05-07T19:43:01.5138852Z VENDOR_NAME="AWS" 2025-05-07T19:43:01.5138953Z VENDOR_URL="https://aws.amazon.com/" 2025-05-07T19:43:01.5139042Z SUPPORT_END="2029-06-30" 2025-05-07T19:43:01.5139046Z 2025-05-07T19:43:01.5175354Z ##[group]Run . $PRELUDE; print_gpu_info 2025-05-07T19:43:01.5175509Z . $PRELUDE; print_gpu_info 2025-05-07T19:43:01.5175778Z shell: bash --noprofile --norc -e -o pipefail {0} 2025-05-07T19:43:01.5175845Z env: 2025-05-07T19:43:01.5176092Z PRELUDE: .github/scripts/setup_env.bash 2025-05-07T19:43:01.5176178Z BUILD_ENV: build_binary 2025-05-07T19:43:01.5176258Z BUILD_TARGET: genai 2025-05-07T19:43:01.5176333Z BUILD_VARIANT: cuda 2025-05-07T19:43:01.5176420Z BUILD_CUDA_VERSION: 12.8.0 2025-05-07T19:43:01.5176494Z ##[endgroup] 2025-05-07T19:43:01.9502694Z ################################################################################ 2025-05-07T19:43:01.9503142Z [INFO] Printing general display info ... 2025-05-07T19:43:01.9527228Z [EXEC] [ATTEMPT 0/3] + wget -q --timeout 1 pypi.org -O /dev/null 2025-05-07T19:43:02.0542372Z [CHECK] Network does not appear to be blocked. 2025-05-07T19:43:02.0548611Z /usr/bin/sudo 2025-05-07T19:43:02.0558269Z which: no apt-get in (/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin) 2025-05-07T19:43:02.0564761Z /usr/bin/yum 2025-05-07T19:43:02.0565527Z [INSTALL] Updating system repositories ... 2025-05-07T19:43:02.0586940Z [EXEC] [ATTEMPT 0/3] + sudo yum update -y 2025-05-07T19:43:02.2711864Z Last metadata expiration check: 0:00:18 ago on Wed May 7 19:42:44 2025. 2025-05-07T19:43:02.3669679Z Dependencies resolved. 2025-05-07T19:43:02.3883178Z Nothing to do. 2025-05-07T19:43:02.3883852Z Complete! 2025-05-07T19:43:02.4325826Z [INSTALL] Installing system package(s): hostname lshw ... 2025-05-07T19:43:02.4345938Z [EXEC] [ATTEMPT 0/3] + sudo yum install -y hostname lshw 2025-05-07T19:43:02.6456258Z Last metadata expiration check: 0:00:18 ago on Wed May 7 19:42:44 2025. 2025-05-07T19:43:02.6971959Z Dependencies resolved. 2025-05-07T19:43:02.7138799Z ================================================================================ 2025-05-07T19:43:02.7140422Z Package Arch Version Repository Size 2025-05-07T19:43:02.7141630Z ================================================================================ 2025-05-07T19:43:02.7141949Z Installing: 2025-05-07T19:43:02.7142289Z hostname x86_64 3.23-4.amzn2023.0.3 amazonlinux 28 k 2025-05-07T19:43:02.7142793Z lshw x86_64 B.02.19.2-7.amzn2023.0.3 amazonlinux 319 k 2025-05-07T19:43:02.7143089Z 2025-05-07T19:43:02.7143180Z Transaction Summary 2025-05-07T19:43:02.7143452Z ================================================================================ 2025-05-07T19:43:02.7143773Z Install 2 Packages 2025-05-07T19:43:02.7143937Z 2025-05-07T19:43:02.7144040Z Total download size: 347 k 2025-05-07T19:43:02.7144304Z Installed size: 883 k 2025-05-07T19:43:02.7144561Z Downloading Packages: 2025-05-07T19:43:03.0146420Z (1/2): hostname-3.23-4.amzn2023.0.3.x86_64.rpm 1.3 MB/s | 28 kB 00:00 2025-05-07T19:43:03.0206339Z (2/2): lshw-B.02.19.2-7.amzn2023.0.3.x86_64.rpm 12 MB/s | 319 kB 00:00 2025-05-07T19:43:03.0217606Z -------------------------------------------------------------------------------- 2025-05-07T19:43:03.0221384Z Total 1.1 MB/s | 347 kB 00:00 2025-05-07T19:43:03.0447453Z Running transaction check 2025-05-07T19:43:03.0501080Z Transaction check succeeded. 2025-05-07T19:43:03.0501992Z Running transaction test 2025-05-07T19:43:03.0657723Z Transaction test succeeded. 2025-05-07T19:43:03.0658601Z Running transaction 2025-05-07T19:43:03.0982997Z Preparing : 1/1 2025-05-07T19:43:03.1096803Z Installing : lshw-B.02.19.2-7.amzn2023.0.3.x86_64 1/2 2025-05-07T19:43:03.1163720Z Installing : hostname-3.23-4.amzn2023.0.3.x86_64 2/2 2025-05-07T19:43:04.1441355Z Running scriptlet: hostname-3.23-4.amzn2023.0.3.x86_64 2/2 2025-05-07T19:43:04.1443562Z Verifying : hostname-3.23-4.amzn2023.0.3.x86_64 1/2 2025-05-07T19:43:04.1806399Z Verifying : lshw-B.02.19.2-7.amzn2023.0.3.x86_64 2/2 2025-05-07T19:43:04.1807562Z 2025-05-07T19:43:04.1807673Z Installed: 2025-05-07T19:43:04.1808022Z hostname-3.23-4.amzn2023.0.3.x86_64 lshw-B.02.19.2-7.amzn2023.0.3.x86_64 2025-05-07T19:43:04.1808633Z 2025-05-07T19:43:04.1808740Z Complete! 2025-05-07T19:43:04.2163826Z + hostname 2025-05-07T19:43:04.2164132Z 2025-05-07T19:43:04.2172422Z 565b81b7c816 2025-05-07T19:43:04.2172856Z 2025-05-07T19:43:04.2173208Z + sudo lshw -C display 2025-05-07T19:43:04.2173391Z 2025-05-07T19:43:04.4124471Z *-display UNCLAIMED 2025-05-07T19:43:04.4125398Z description: VGA compatible controller 2025-05-07T19:43:04.4126397Z product: Amazon.com, Inc. 2025-05-07T19:43:04.4127255Z vendor: Amazon.com, Inc. 2025-05-07T19:43:04.4128033Z physical id: 3 2025-05-07T19:43:04.4128538Z bus info: pci@0000:00:03.0 2025-05-07T19:43:04.4128817Z version: 00 2025-05-07T19:43:04.4129091Z width: 32 bits 2025-05-07T19:43:04.4129325Z clock: 33MHz 2025-05-07T19:43:04.4129601Z capabilities: vga_controller bus_master 2025-05-07T19:43:04.4129960Z configuration: latency=0 2025-05-07T19:43:04.4130299Z resources: memory:c0000000-c03fffff memory:c0000-dffff 2025-05-07T19:43:04.4142715Z 2025-05-07T19:43:04.4143063Z ################################################################################ 2025-05-07T19:43:04.4143433Z [INFO] Printing NVIDIA GPU info ... 2025-05-07T19:43:04.4250621Z lspci: Unable to load libkmod resources: error -2 2025-05-07T19:43:04.4271059Z which: no nvidia-smi in (/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin) 2025-05-07T19:43:04.4272510Z [CHECK] nvidia-smi not found 2025-05-07T19:43:04.4273401Z ################################################################################ 2025-05-07T19:43:04.4274392Z [INFO] Printing AMD GPU info ... 2025-05-07T19:43:04.4377639Z lspci: Unable to load libkmod resources: error -2 2025-05-07T19:43:04.4401918Z which: no rocminfo in (/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin) 2025-05-07T19:43:04.4402424Z [CHECK] rocminfo not found 2025-05-07T19:43:04.4411558Z which: no rocm-smi in (/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin) 2025-05-07T19:43:04.4412985Z [CHECK] rocm-smi not found 2025-05-07T19:43:04.4487524Z ##[group]Run . $PRELUDE; setup_miniconda $HOME/miniconda 2025-05-07T19:43:04.4488045Z . $PRELUDE; setup_miniconda $HOME/miniconda 2025-05-07T19:43:04.4488994Z shell: bash --noprofile --norc -e -o pipefail {0} 2025-05-07T19:43:04.4489377Z env: 2025-05-07T19:43:04.4489628Z PRELUDE: .github/scripts/setup_env.bash 2025-05-07T19:43:04.4489986Z BUILD_ENV: build_binary 2025-05-07T19:43:04.4490255Z BUILD_TARGET: genai 2025-05-07T19:43:04.4490533Z BUILD_VARIANT: cuda 2025-05-07T19:43:04.4490818Z BUILD_CUDA_VERSION: 12.8.0 2025-05-07T19:43:04.4491090Z ##[endgroup] 2025-05-07T19:43:04.9535380Z ################################################################################ 2025-05-07T19:43:04.9536439Z # Setup Miniconda 2025-05-07T19:43:04.9537077Z # 2025-05-07T19:43:04.9545602Z # [2025-05-07T19:43:04.954Z] + setup_miniconda /github/home/miniconda 2025-05-07T19:43:04.9546870Z ################################################################################ 2025-05-07T19:43:04.9547795Z 2025-05-07T19:43:04.9565092Z [EXEC] [ATTEMPT 0/3] + wget -q --timeout 1 pypi.org -O /dev/null 2025-05-07T19:43:05.0405508Z [CHECK] Network does not appear to be blocked. 2025-05-07T19:43:05.0406614Z + mkdir -p /github/home/miniconda 2025-05-07T19:43:05.0407215Z 2025-05-07T19:43:05.0429056Z 2025-05-07T19:43:05.0429274Z [SETUP] Downloading the Miniconda installer ... 2025-05-07T19:43:05.0450656Z [EXEC] [ATTEMPT 0/3] + wget -q https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh 2025-05-07T19:43:06.0100036Z [SETUP] Installing Miniconda ... 2025-05-07T19:43:06.0101625Z + bash miniconda.sh -b -p /github/home/miniconda -u 2025-05-07T19:43:06.0102459Z 2025-05-07T19:43:06.0236392Z PREFIX=/github/home/miniconda 2025-05-07T19:43:06.3760737Z Unpacking payload ... 2025-05-07T19:43:06.8595103Z entry_point.py:256: DeprecationWarning: Python 3.14 will, by default, filter extracted tar archives and reject files or modify their metadata. Use the filter argument to control this behavior. 2025-05-07T19:43:07.5310846Z entry_point.py:256: DeprecationWarning: Python 3.14 will, by default, filter extracted tar archives and reject files or modify their metadata. Use the filter argument to control this behavior. 2025-05-07T19:43:09.3736857Z 2025-05-07T19:43:09.3737320Z Installing base environment... 2025-05-07T19:43:09.3737589Z 2025-05-07T19:43:10.3665235Z Preparing transaction: ...working... done 2025-05-07T19:43:13.2583784Z Executing transaction: ...working... done 2025-05-07T19:43:13.8090598Z entry_point.py:256: DeprecationWarning: Python 3.14 will, by default, filter extracted tar archives and reject files or modify their metadata. Use the filter argument to control this behavior. 2025-05-07T19:43:13.8784589Z installation finished. 2025-05-07T19:43:13.8787823Z 2025-05-07T19:43:13.8788243Z + rm -f miniconda.sh 2025-05-07T19:43:13.8788415Z 2025-05-07T19:43:13.8937696Z 2025-05-07T19:43:13.8938245Z [SETUP] Reloading the bash configuration ... 2025-05-07T19:43:13.8939408Z + /github/home/miniconda/bin/conda init bash 2025-05-07T19:43:13.8940337Z 2025-05-07T19:43:14.2560809Z no change /github/home/miniconda/condabin/conda 2025-05-07T19:43:14.2562046Z no change /github/home/miniconda/bin/conda 2025-05-07T19:43:14.2563117Z no change /github/home/miniconda/bin/conda-env 2025-05-07T19:43:14.2564239Z no change /github/home/miniconda/bin/activate 2025-05-07T19:43:14.2565316Z no change /github/home/miniconda/bin/deactivate 2025-05-07T19:43:14.2566449Z no change /github/home/miniconda/etc/profile.d/conda.sh 2025-05-07T19:43:14.2566874Z no change /github/home/miniconda/etc/fish/conf.d/conda.fish 2025-05-07T19:43:14.2567322Z no change /github/home/miniconda/shell/condabin/Conda.psm1 2025-05-07T19:43:14.2567791Z no change /github/home/miniconda/shell/condabin/conda-hook.ps1 2025-05-07T19:43:14.2568325Z no change /github/home/miniconda/lib/python3.13/site-packages/xontrib/conda.xsh 2025-05-07T19:43:14.2569199Z no change /github/home/miniconda/etc/profile.d/conda.csh 2025-05-07T19:43:14.2569577Z modified /github/home/.bashrc 2025-05-07T19:43:14.2569786Z 2025-05-07T19:43:14.2570004Z ==> For changes to take effect, close and re-open your current shell. <== 2025-05-07T19:43:14.2570313Z 2025-05-07T19:43:14.3082784Z 2025-05-07T19:43:14.3083388Z + . /github/home/.bashrc 2025-05-07T19:43:14.3084006Z 2025-05-07T19:43:15.1105900Z 2025-05-07T19:43:15.1106555Z [SETUP] Installing libmamba-solver (required since Anaconda 2024.02-1) and libarchive ... 2025-05-07T19:43:15.1135283Z [EXEC] [ATTEMPT 0/3] + conda install --solver=classic -c conda-forge --override-channels -y conda-libmamba-solver libmamba libmambapy libarchive 2025-05-07T19:43:26.9733708Z Collecting package metadata (current_repodata.json): - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - done 2025-05-07T19:43:28.4266346Z Solving environment: | / - \ | / - \ | / - done 2025-05-07T19:43:28.5160264Z 2025-05-07T19:43:28.5160740Z ## Package Plan ## 2025-05-07T19:43:28.5160945Z 2025-05-07T19:43:28.5161100Z environment location: /github/home/miniconda 2025-05-07T19:43:28.5161432Z 2025-05-07T19:43:28.5161531Z added / updated specs: 2025-05-07T19:43:28.5161819Z - conda-libmamba-solver 2025-05-07T19:43:28.5162085Z - libarchive 2025-05-07T19:43:28.5162317Z - libmamba 2025-05-07T19:43:28.5162526Z - libmambapy 2025-05-07T19:43:28.5162677Z 2025-05-07T19:43:28.5162681Z 2025-05-07T19:43:28.5162807Z The following packages will be downloaded: 2025-05-07T19:43:28.5163038Z 2025-05-07T19:43:28.5163170Z package | build 2025-05-07T19:43:28.5163519Z ---------------------------|----------------- 2025-05-07T19:43:28.5164318Z ca-certificates-2025.4.26 | hbd8a1cb_0 149 KB conda-forge 2025-05-07T19:43:28.5164826Z certifi-2025.4.26 | pyhd8ed1ab_0 154 KB conda-forge 2025-05-07T19:43:28.5165312Z conda-25.3.1 | py313h78bf25f_1 1.1 MB conda-forge 2025-05-07T19:43:28.5165831Z conda-libmamba-solver-25.4.0| pyhd8ed1ab_0 41 KB conda-forge 2025-05-07T19:43:28.5166307Z ------------------------------------------------------------ 2025-05-07T19:43:28.5166685Z Total: 1.4 MB 2025-05-07T19:43:28.5166910Z 2025-05-07T19:43:28.5167030Z The following packages will be UPDATED: 2025-05-07T19:43:28.5167351Z 2025-05-07T19:43:28.5172814Z ca-certificates pkgs/main/linux-64::ca-certificates-2~ --> conda-forge/noarch::ca-certificates-2025.4.26-hbd8a1cb_0 2025-05-07T19:43:28.5173701Z conda pkgs/main::conda-25.3.1-py313h06a4308~ --> conda-forge::conda-25.3.1-py313h78bf25f_1 2025-05-07T19:43:28.5174154Z 2025-05-07T19:43:28.5174389Z The following packages will be SUPERSEDED by a higher-priority channel: 2025-05-07T19:43:28.5174753Z 2025-05-07T19:43:28.5175099Z certifi pkgs/main/linux-64::certifi-2025.4.26~ --> conda-forge/noarch::certifi-2025.4.26-pyhd8ed1ab_0 2025-05-07T19:43:28.5175985Z conda-libmamba-so~ pkgs/main::conda-libmamba-solver-25.4~ --> conda-forge::conda-libmamba-solver-25.4.0-pyhd8ed1ab_0 2025-05-07T19:43:28.5176521Z 2025-05-07T19:43:28.5176525Z 2025-05-07T19:43:28.5176529Z 2025-05-07T19:43:28.5176686Z Downloading and Extracting Packages: ...working... 2025-05-07T19:43:28.5177120Z conda-25.3.1 | 1.1 MB | | 0% 2025-05-07T19:43:28.5177376Z 2025-05-07T19:43:28.5177750Z certifi-2025.4.26 | 154 KB | | 0%  2025-05-07T19:43:28.5178025Z 2025-05-07T19:43:28.5178029Z 2025-05-07T19:43:28.5178263Z ca-certificates-2025 | 149 KB | | 0%  2025-05-07T19:43:28.5178546Z 2025-05-07T19:43:28.5178550Z 2025-05-07T19:43:28.5178766Z 2025-05-07T19:43:28.6019393Z conda-libmamba-solve | 41 KB | | 0%  2025-05-07T19:43:28.6020057Z 2025-05-07T19:43:28.6073630Z certifi-2025.4.26 | 154 KB | ########## | 100%  2025-05-07T19:43:28.6073932Z 2025-05-07T19:43:28.6074239Z 2025-05-07T19:43:28.6143292Z ca-certificates-2025 | 149 KB | ########## | 100%  2025-05-07T19:43:28.6212222Z conda-25.3.1 | 1.1 MB | ########## | 100% 2025-05-07T19:43:28.6212522Z 2025-05-07T19:43:28.6214394Z certifi-2025.4.26 | 154 KB | ########## | 100%  2025-05-07T19:43:28.6214670Z 2025-05-07T19:43:28.6219853Z certifi-2025.4.26 | 154 KB | ########## | 100%  2025-05-07T19:43:28.6220131Z 2025-05-07T19:43:28.6220136Z 2025-05-07T19:43:28.6221931Z ca-certificates-2025 | 149 KB | ########## | 100%  2025-05-07T19:43:28.6222211Z 2025-05-07T19:43:28.6223182Z 2025-05-07T19:43:28.6289043Z ca-certificates-2025 | 149 KB | ########## | 100%  2025-05-07T19:43:28.6289450Z 2025-05-07T19:43:28.6289541Z 2025-05-07T19:43:28.6289545Z 2025-05-07T19:43:28.6296086Z conda-libmamba-solve | 41 KB | ###9 | 39%  2025-05-07T19:43:28.6296405Z 2025-05-07T19:43:28.6296410Z 2025-05-07T19:43:28.6300095Z 2025-05-07T19:43:28.6444865Z conda-libmamba-solve | 41 KB | ########## | 100%  2025-05-07T19:43:28.6445210Z 2025-05-07T19:43:28.6445215Z 2025-05-07T19:43:28.6445219Z 2025-05-07T19:43:28.7224433Z conda-libmamba-solve | 41 KB | ########## | 100%  2025-05-07T19:43:28.7224920Z conda-25.3.1 | 1.1 MB | ########## | 100% 2025-05-07T19:43:28.7230559Z conda-25.3.1 | 1.1 MB | ########## | 100% 2025-05-07T19:43:28.7230926Z 2025-05-07T19:43:28.7231140Z 2025-05-07T19:43:28.7231346Z  2025-05-07T19:43:28.7231560Z 2025-05-07T19:43:28.7231563Z 2025-05-07T19:43:28.7231991Z  2025-05-07T19:43:28.7232226Z 2025-05-07T19:43:28.7232229Z 2025-05-07T19:43:28.7232233Z 2025-05-07T19:43:28.7234900Z  done 2025-05-07T19:43:28.8242103Z Preparing transaction: | done 2025-05-07T19:43:28.9252306Z Verifying transaction: - done 2025-05-07T19:43:30.2287706Z Executing transaction: | / - \ | / - \ | / - \ | done 2025-05-07T19:43:31.8041134Z [SETUP] Updating Miniconda base packages ... 2025-05-07T19:43:31.8065323Z [EXEC] [ATTEMPT 0/3] + conda update -n base -c defaults --update-deps -y conda 2025-05-07T19:43:32.5331084Z Channels: 2025-05-07T19:43:32.5331370Z - defaults 2025-05-07T19:43:32.5331604Z Platform: linux-64 2025-05-07T19:43:33.5993515Z Collecting package metadata (repodata.json): - \ | / - \ done 2025-05-07T19:43:33.7292487Z Solving environment: / - Channels: 2025-05-07T19:43:33.7292911Z - defaults 2025-05-07T19:43:33.7293226Z Platform: linux-64 2025-05-07T19:43:34.0700840Z Collecting package metadata (repodata.json): | / - \ | / done 2025-05-07T19:43:34.2997615Z Solving environment: \ | / - done 2025-05-07T19:43:34.3898024Z done 2025-05-07T19:43:34.4541543Z 2025-05-07T19:43:34.4541816Z ## Package Plan ## 2025-05-07T19:43:34.4542009Z 2025-05-07T19:43:34.4542152Z environment location: /github/home/miniconda 2025-05-07T19:43:34.4542421Z 2025-05-07T19:43:34.4542522Z added / updated specs: 2025-05-07T19:43:34.4542774Z - conda 2025-05-07T19:43:34.4542914Z 2025-05-07T19:43:34.4542920Z 2025-05-07T19:43:34.4543060Z The following packages will be downloaded: 2025-05-07T19:43:34.4543300Z 2025-05-07T19:43:34.4543433Z package | build 2025-05-07T19:43:34.4543771Z ---------------------------|----------------- 2025-05-07T19:43:34.4544147Z pip-25.1 | pyhc872135_2 1.3 MB 2025-05-07T19:43:34.4544555Z tzdata-2025b | h04d1e81_0 116 KB 2025-05-07T19:43:34.4545286Z ------------------------------------------------------------ 2025-05-07T19:43:34.4545718Z Total: 1.4 MB 2025-05-07T19:43:34.4545959Z 2025-05-07T19:43:34.4546096Z The following packages will be UPDATED: 2025-05-07T19:43:34.4546333Z 2025-05-07T19:43:34.4546700Z pip pkgs/main/linux-64::pip-25.0-py313h06~ --> pkgs/main/noarch::pip-25.1-pyhc872135_2 2025-05-07T19:43:34.4547277Z tzdata 2025a-h04d1e81_0 --> 2025b-h04d1e81_0 2025-05-07T19:43:34.4547584Z 2025-05-07T19:43:34.4547588Z 2025-05-07T19:43:34.4547591Z 2025-05-07T19:43:34.4547751Z Downloading and Extracting Packages: ...working... 2025-05-07T19:43:34.4548195Z pip-25.1 | 1.3 MB | | 0% 2025-05-07T19:43:34.4548438Z 2025-05-07T19:43:34.4980417Z tzdata-2025b | 116 KB | | 0%  2025-05-07T19:43:34.4980789Z 2025-05-07T19:43:34.5143248Z tzdata-2025b | 116 KB | ########## | 100%  2025-05-07T19:43:34.6977280Z pip-25.1 | 1.3 MB | ########## | 100% 2025-05-07T19:43:34.6977607Z 2025-05-07T19:43:34.6980643Z tzdata-2025b | 116 KB | ########## | 100%  2025-05-07T19:43:34.6980961Z 2025-05-07T19:43:34.7030159Z tzdata-2025b | 116 KB | ########## | 100%  2025-05-07T19:43:34.7030633Z pip-25.1 | 1.3 MB | ########## | 100% 2025-05-07T19:43:34.7034945Z pip-25.1 | 1.3 MB | ########## | 100% 2025-05-07T19:43:34.7035353Z 2025-05-07T19:43:34.7035629Z 2025-05-07T19:43:34.7035976Z  done 2025-05-07T19:43:34.8050143Z Preparing transaction: | done 2025-05-07T19:43:34.9065555Z Verifying transaction: - done 2025-05-07T19:43:36.9115496Z Executing transaction: | / - \ | / - \ | / - \ | / - \ | / - \ done 2025-05-07T19:43:37.4535326Z [SETUP] Cleaning up Conda packages ... 2025-05-07T19:43:37.4536124Z + conda clean --packages --tarball -y 2025-05-07T19:43:37.4536397Z 2025-05-07T19:43:37.8904933Z Will remove 99 (117.8 MB) tarball(s). 2025-05-07T19:43:37.8905328Z Will remove 11 (16.0 MB) package(s). 2025-05-07T19:43:37.9462311Z 2025-05-07T19:43:37.9467505Z + conda clean --all -y 2025-05-07T19:43:37.9467733Z 2025-05-07T19:43:38.3895477Z There are no unused tarball(s) to remove. 2025-05-07T19:43:38.3895977Z Will remove 1 index cache(s). 2025-05-07T19:43:38.3896346Z There are no unused package(s) to remove. 2025-05-07T19:43:38.3896714Z There are no tempfile(s) to remove. 2025-05-07T19:43:38.3897075Z There are no logfile(s) to remove. 2025-05-07T19:43:38.4441197Z 2025-05-07T19:43:38.4443980Z + conda info 2025-05-07T19:43:38.4444172Z 2025-05-07T19:43:39.0063830Z 2025-05-07T19:43:39.0064330Z active environment : base 2025-05-07T19:43:39.0064753Z active env location : /github/home/miniconda 2025-05-07T19:43:39.0065102Z shell level : 1 2025-05-07T19:43:39.0065455Z user config file : /github/home/.condarc 2025-05-07T19:43:39.0065855Z populated config files : /github/home/miniconda/.condarc 2025-05-07T19:43:39.0066257Z conda version : 25.3.1 2025-05-07T19:43:39.0066549Z conda-build version : not installed 2025-05-07T19:43:39.0066874Z python version : 3.13.2.final.0 2025-05-07T19:43:39.0067203Z solver : libmamba (default) 2025-05-07T19:43:39.0067545Z virtual packages : __archspec=1=cascadelake 2025-05-07T19:43:39.0067892Z __conda=25.3.1=0 2025-05-07T19:43:39.0068183Z __glibc=2.34=0 2025-05-07T19:43:39.0068487Z __linux=6.1.130=0 2025-05-07T19:43:39.0068772Z __unix=0=0 2025-05-07T19:43:39.0069133Z base environment : /github/home/miniconda (writable) 2025-05-07T19:43:39.0069552Z conda av data dir : /github/home/miniconda/etc/conda 2025-05-07T19:43:39.0069942Z conda av metadata url : None 2025-05-07T19:43:39.0070668Z channel URLs : https://repo.anaconda.com/pkgs/main/linux-64 2025-05-07T19:43:39.0071140Z https://repo.anaconda.com/pkgs/main/noarch 2025-05-07T19:43:39.0071578Z https://repo.anaconda.com/pkgs/r/linux-64 2025-05-07T19:43:39.0071973Z https://repo.anaconda.com/pkgs/r/noarch 2025-05-07T19:43:39.0072368Z package cache : /github/home/miniconda/pkgs 2025-05-07T19:43:39.0072714Z /github/home/.conda/pkgs 2025-05-07T19:43:39.0073091Z envs directories : /github/home/miniconda/envs 2025-05-07T19:43:39.0073458Z /github/home/.conda/envs 2025-05-07T19:43:39.0073772Z platform : linux-64 2025-05-07T19:43:39.0074705Z user-agent : conda/25.3.1 requests/2.32.3 CPython/3.13.2 Linux/6.1.130-139.222.amzn2023.x86_64 amzn/2023.7.20250428 glibc/2.34 solver/libmamba conda-libmamba-solver/25.4.0 libmambapy/2.0.5 aau/0.7.0 c/. s/. e/. 2025-05-07T19:43:39.0075637Z UID:GID : 0:0 2025-05-07T19:43:39.0075919Z netrc file : None 2025-05-07T19:43:39.0076187Z offline mode : False 2025-05-07T19:43:39.0076492Z 2025-05-07T19:43:39.0662664Z 2025-05-07T19:43:39.0663104Z [SETUP] Exporting Miniconda variables ... 2025-05-07T19:43:39.0663848Z [SETUP] Saving Miniconda variables to /__w/_temp/_runner_file_commands/add_path_a988bb89-a996-44e5-a5bd-067d07fb8c63 ... 2025-05-07T19:43:39.0665905Z [SETUP] Successfully set up Miniconda at /github/home/miniconda 2025-05-07T19:43:39.0856716Z ##[group]Run . $PRELUDE; create_conda_environment $BUILD_ENV 3.13 2025-05-07T19:43:39.0857313Z . $PRELUDE; create_conda_environment $BUILD_ENV 3.13 2025-05-07T19:43:39.0858129Z shell: bash --noprofile --norc -e -o pipefail {0} 2025-05-07T19:43:39.0858504Z env: 2025-05-07T19:43:39.0858759Z PRELUDE: .github/scripts/setup_env.bash 2025-05-07T19:43:39.0859114Z BUILD_ENV: build_binary 2025-05-07T19:43:39.0859767Z BUILD_TARGET: genai 2025-05-07T19:43:39.0860208Z BUILD_VARIANT: cuda 2025-05-07T19:43:39.0860509Z BUILD_CUDA_VERSION: 12.8.0 2025-05-07T19:43:39.0860854Z ##[endgroup] 2025-05-07T19:43:39.5213739Z ################################################################################ 2025-05-07T19:43:39.5214162Z # Create Conda Environment 2025-05-07T19:43:39.5214423Z # 2025-05-07T19:43:39.5229482Z # [2025-05-07T19:43:39.522Z] + create_conda_environment build_binary 3.13 2025-05-07T19:43:39.5229994Z ################################################################################ 2025-05-07T19:43:39.5230234Z 2025-05-07T19:43:39.5249412Z [EXEC] [ATTEMPT 0/3] + wget -q --timeout 1 pypi.org -O /dev/null 2025-05-07T19:43:39.6097252Z [CHECK] Network does not appear to be blocked. 2025-05-07T19:43:39.6098415Z [SETUP] Listing existing Conda environments ... 2025-05-07T19:43:39.6099410Z + conda info --envs 2025-05-07T19:43:39.6100065Z 2025-05-07T19:43:40.1868583Z 2025-05-07T19:43:40.1868921Z # conda environments: 2025-05-07T19:43:40.1869315Z # 2025-05-07T19:43:40.1869559Z base /github/home/miniconda 2025-05-07T19:43:40.1869816Z 2025-05-07T19:43:40.2456936Z 2025-05-07T19:43:40.2457592Z [SETUP] Deleting the prefix directory if it exists ... 2025-05-07T19:43:41.8525833Z + rm -rf /github/home/miniconda/envs/build_binary 2025-05-07T19:43:41.8526142Z 2025-05-07T19:43:41.8541423Z 2025-05-07T19:43:41.8558803Z [SETUP] Creating new Conda environment (Python 3.13) ... 2025-05-07T19:43:41.8586928Z [EXEC] [ATTEMPT 0/3] + conda create -y -n build_binary python=3.13 2025-05-07T19:43:42.4408900Z Channels: 2025-05-07T19:43:42.4409548Z - defaults 2025-05-07T19:43:42.4410182Z Platform: linux-64 2025-05-07T19:43:43.8182795Z Collecting package metadata (repodata.json): - \ | / - \ | / - done 2025-05-07T19:43:43.9185894Z Solving environment: | done 2025-05-07T19:43:43.9483284Z 2025-05-07T19:43:43.9483600Z ## Package Plan ## 2025-05-07T19:43:43.9484090Z 2025-05-07T19:43:43.9484783Z environment location: /github/home/miniconda/envs/build_binary 2025-05-07T19:43:43.9485161Z 2025-05-07T19:43:43.9485275Z added / updated specs: 2025-05-07T19:43:43.9485566Z - python=3.13 2025-05-07T19:43:43.9485710Z 2025-05-07T19:43:43.9485714Z 2025-05-07T19:43:43.9485845Z The following packages will be downloaded: 2025-05-07T19:43:43.9486081Z 2025-05-07T19:43:43.9486222Z package | build 2025-05-07T19:43:43.9486565Z ---------------------------|----------------- 2025-05-07T19:43:43.9486975Z _libgcc_mutex-0.1 | main 3 KB 2025-05-07T19:43:43.9487399Z _openmp_mutex-5.1 | 1_gnu 21 KB 2025-05-07T19:43:43.9487862Z ca-certificates-2025.2.25 | h06a4308_0 129 KB 2025-05-07T19:43:43.9488327Z python_abi-3.13 | 0_cp313 6 KB 2025-05-07T19:43:43.9488723Z ------------------------------------------------------------ 2025-05-07T19:43:43.9489114Z Total: 159 KB 2025-05-07T19:43:43.9489336Z 2025-05-07T19:43:43.9489473Z The following NEW packages will be INSTALLED: 2025-05-07T19:43:43.9489732Z 2025-05-07T19:43:43.9489950Z _libgcc_mutex pkgs/main/linux-64::_libgcc_mutex-0.1-main 2025-05-07T19:43:43.9490443Z _openmp_mutex pkgs/main/linux-64::_openmp_mutex-5.1-1_gnu 2025-05-07T19:43:43.9490894Z bzip2 pkgs/main/linux-64::bzip2-1.0.8-h5eee18b_6 2025-05-07T19:43:43.9491786Z ca-certificates pkgs/main/linux-64::ca-certificates-2025.2.25-h06a4308_0 2025-05-07T19:43:43.9492334Z expat pkgs/main/linux-64::expat-2.7.1-h6a678d5_0 2025-05-07T19:43:43.9492840Z ld_impl_linux-64 pkgs/main/linux-64::ld_impl_linux-64-2.40-h12ee557_0 2025-05-07T19:43:43.9493354Z libffi pkgs/main/linux-64::libffi-3.4.4-h6a678d5_1 2025-05-07T19:43:43.9493813Z libgcc-ng pkgs/main/linux-64::libgcc-ng-11.2.0-h1234567_1 2025-05-07T19:43:43.9494306Z libgomp pkgs/main/linux-64::libgomp-11.2.0-h1234567_1 2025-05-07T19:43:43.9494919Z libmpdec pkgs/main/linux-64::libmpdec-4.0.0-h5eee18b_0 2025-05-07T19:43:43.9495439Z libstdcxx-ng pkgs/main/linux-64::libstdcxx-ng-11.2.0-h1234567_1 2025-05-07T19:43:43.9495947Z libuuid pkgs/main/linux-64::libuuid-1.41.5-h5eee18b_0 2025-05-07T19:43:43.9496393Z ncurses pkgs/main/linux-64::ncurses-6.4-h6a678d5_0 2025-05-07T19:43:43.9496861Z openssl pkgs/main/linux-64::openssl-3.0.16-h5eee18b_0 2025-05-07T19:43:43.9497296Z pip pkgs/main/noarch::pip-25.1-pyhc872135_2 2025-05-07T19:43:43.9497767Z python pkgs/main/linux-64::python-3.13.2-hf623796_100_cp313 2025-05-07T19:43:43.9498263Z python_abi pkgs/main/linux-64::python_abi-3.13-0_cp313 2025-05-07T19:43:43.9498720Z readline pkgs/main/linux-64::readline-8.2-h5eee18b_0 2025-05-07T19:43:43.9499243Z setuptools pkgs/main/linux-64::setuptools-78.1.1-py313h06a4308_0 2025-05-07T19:43:43.9499871Z sqlite pkgs/main/linux-64::sqlite-3.45.3-h5eee18b_0 2025-05-07T19:43:43.9500569Z tk pkgs/main/linux-64::tk-8.6.14-h39e8969_0 2025-05-07T19:43:43.9500974Z tzdata pkgs/main/noarch::tzdata-2025b-h04d1e81_0 2025-05-07T19:43:43.9501435Z wheel pkgs/main/linux-64::wheel-0.45.1-py313h06a4308_0 2025-05-07T19:43:43.9501875Z xz pkgs/main/linux-64::xz-5.6.4-h5eee18b_1 2025-05-07T19:43:43.9502267Z zlib pkgs/main/linux-64::zlib-1.2.13-h5eee18b_1 2025-05-07T19:43:43.9502556Z 2025-05-07T19:43:43.9502560Z 2025-05-07T19:43:43.9502564Z 2025-05-07T19:43:43.9502721Z Downloading and Extracting Packages: ...working... 2025-05-07T19:43:43.9503129Z ca-certificates-2025 | 129 KB | | 0% 2025-05-07T19:43:43.9503406Z 2025-05-07T19:43:43.9503749Z _openmp_mutex-5.1 | 21 KB | | 0%  2025-05-07T19:43:43.9504010Z 2025-05-07T19:43:43.9504014Z 2025-05-07T19:43:43.9531072Z python_abi-3.13 | 6 KB | | 0%  2025-05-07T19:43:43.9531881Z 2025-05-07T19:43:43.9531886Z 2025-05-07T19:43:43.9531890Z 2025-05-07T19:43:43.9898038Z _libgcc_mutex-0.1 | 3 KB | | 0%  2025-05-07T19:43:43.9898364Z 2025-05-07T19:43:43.9898369Z 2025-05-07T19:43:43.9899204Z 2025-05-07T19:43:43.9902642Z _libgcc_mutex-0.1 | 3 KB | ########## | 100%  2025-05-07T19:43:43.9902948Z 2025-05-07T19:43:43.9903090Z 2025-05-07T19:43:43.9926324Z python_abi-3.13 | 6 KB | ########## | 100%  2025-05-07T19:43:43.9926924Z 2025-05-07T19:43:43.9988581Z _openmp_mutex-5.1 | 21 KB | ########## | 100%  2025-05-07T19:43:44.0022060Z ca-certificates-2025 | 129 KB | ########## | 100% 2025-05-07T19:43:44.0022881Z 2025-05-07T19:43:44.0022916Z 2025-05-07T19:43:44.0022928Z 2025-05-07T19:43:44.0029731Z _libgcc_mutex-0.1 | 3 KB | ########## | 100%  2025-05-07T19:43:44.0030552Z 2025-05-07T19:43:44.0030595Z 2025-05-07T19:43:44.0127400Z python_abi-3.13 | 6 KB | ########## | 100%  2025-05-07T19:43:44.0176800Z ca-certificates-2025 | 129 KB | ########## | 100% 2025-05-07T19:43:44.0177639Z 2025-05-07T19:43:44.0179943Z _openmp_mutex-5.1 | 21 KB | ########## | 100%  2025-05-07T19:43:44.0181046Z 2025-05-07T19:43:44.0181658Z 2025-05-07T19:43:44.0182108Z  2025-05-07T19:43:44.0182323Z 2025-05-07T19:43:44.0182327Z 2025-05-07T19:43:44.0182748Z  2025-05-07T19:43:44.0182973Z 2025-05-07T19:43:44.0182977Z 2025-05-07T19:43:44.0182980Z 2025-05-07T19:43:44.0183169Z  done 2025-05-07T19:43:44.2292969Z Preparing transaction: - \ done 2025-05-07T19:43:45.7763451Z Verifying transaction: / - \ | / - \ | / - \ | / - done 2025-05-07T19:43:47.9868878Z Executing transaction: | / - \ | / - \ | / - \ | / - \ | / - \ | / done 2025-05-07T19:43:47.9906269Z # 2025-05-07T19:43:47.9906557Z # To activate this environment, use 2025-05-07T19:43:47.9906899Z # 2025-05-07T19:43:47.9907113Z # $ conda activate build_binary 2025-05-07T19:43:47.9907421Z # 2025-05-07T19:43:47.9907655Z # To deactivate an active environment, use 2025-05-07T19:43:47.9907993Z # 2025-05-07T19:43:47.9908222Z # $ conda deactivate 2025-05-07T19:43:47.9908391Z 2025-05-07T19:43:48.0742156Z [SETUP] Upgrading PIP to latest ... 2025-05-07T19:43:48.0778924Z [EXEC] [ATTEMPT 0/3] + conda run -n build_binary pip install --upgrade pip 2025-05-07T19:43:51.0419926Z WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager, possibly rendering your system unusable. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv. Use the --root-user-action option if you know what you are doing and want to suppress this warning. 2025-05-07T19:43:51.0421503Z 2025-05-07T19:43:51.0421950Z Requirement already satisfied: pip in /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages (25.1) 2025-05-07T19:43:51.0422572Z Collecting pip 2025-05-07T19:43:51.0422913Z Downloading pip-25.1.1-py3-none-any.whl.metadata (3.6 kB) 2025-05-07T19:43:51.0423367Z Downloading pip-25.1.1-py3-none-any.whl (1.8 MB) 2025-05-07T19:43:51.0424266Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.8/1.8 MB 62.1 MB/s eta 0:00:00 2025-05-07T19:43:51.0424739Z Installing collected packages: pip 2025-05-07T19:43:51.0425073Z Attempting uninstall: pip 2025-05-07T19:43:51.0425414Z Found existing installation: pip 25.1 2025-05-07T19:43:51.0425755Z Uninstalling pip-25.1: 2025-05-07T19:43:51.0426099Z Successfully uninstalled pip-25.1 2025-05-07T19:43:51.0426444Z Successfully installed pip-25.1.1 2025-05-07T19:43:51.0426694Z 2025-05-07T19:43:51.1023775Z [SETUP] Upgrading pyOpenSSL ... 2025-05-07T19:43:51.1054512Z [EXEC] [ATTEMPT 0/3] + conda install -n build_binary -c conda-forge --override-channels -y pyOpenSSL>22.1.0 2025-05-07T19:43:51.7671591Z Channels: 2025-05-07T19:43:51.7672277Z - conda-forge 2025-05-07T19:43:51.7672926Z Platform: linux-64 2025-05-07T19:44:01.5696171Z Collecting package metadata (repodata.json): - \ | / - \ | / - \ | / - \ | / - \ done 2025-05-07T19:44:03.4757104Z Solving environment: / - \ | / done 2025-05-07T19:44:03.5238181Z 2025-05-07T19:44:03.5238846Z ## Package Plan ## 2025-05-07T19:44:03.5239355Z 2025-05-07T19:44:03.5239763Z environment location: /github/home/miniconda/envs/build_binary 2025-05-07T19:44:03.5240133Z 2025-05-07T19:44:03.5240251Z added / updated specs: 2025-05-07T19:44:03.5240592Z - pyopenssl[version='>22.1.0'] 2025-05-07T19:44:03.5240810Z 2025-05-07T19:44:03.5240815Z 2025-05-07T19:44:03.5241801Z The following packages will be downloaded: 2025-05-07T19:44:03.5242673Z 2025-05-07T19:44:03.5242895Z package | build 2025-05-07T19:44:03.5243717Z ---------------------------|----------------- 2025-05-07T19:44:03.5244454Z cffi-1.17.1 | py313hfab6e84_0 289 KB conda-forge 2025-05-07T19:44:03.5244957Z cryptography-44.0.3 | py313h6556f6e_0 1.5 MB conda-forge 2025-05-07T19:44:03.5245469Z libgcc-15.1.0 | h767d61c_2 810 KB conda-forge 2025-05-07T19:44:03.5245931Z libgcc-ng-15.1.0 | h69a702a_2 34 KB conda-forge 2025-05-07T19:44:03.5246733Z libgomp-15.1.0 | h767d61c_2 442 KB conda-forge 2025-05-07T19:44:03.5247230Z openssl-3.5.0 | h7b32b05_1 3.0 MB conda-forge 2025-05-07T19:44:03.5247704Z pycparser-2.22 | pyh29332c3_1 108 KB conda-forge 2025-05-07T19:44:03.5248226Z pyopenssl-25.0.0 | pyhd8ed1ab_0 120 KB conda-forge 2025-05-07T19:44:03.5248899Z typing-extensions-4.13.2 | h0e9735f_0 88 KB conda-forge 2025-05-07T19:44:03.5249595Z typing_extensions-4.13.2 | pyh29332c3_0 51 KB conda-forge 2025-05-07T19:44:03.5250204Z ------------------------------------------------------------ 2025-05-07T19:44:03.5250559Z Total: 6.4 MB 2025-05-07T19:44:03.5250780Z 2025-05-07T19:44:03.5250942Z The following NEW packages will be INSTALLED: 2025-05-07T19:44:03.5251177Z 2025-05-07T19:44:03.5251387Z cffi conda-forge/linux-64::cffi-1.17.1-py313hfab6e84_0 2025-05-07T19:44:03.5251927Z cryptography conda-forge/linux-64::cryptography-44.0.3-py313h6556f6e_0 2025-05-07T19:44:03.5252462Z libgcc conda-forge/linux-64::libgcc-15.1.0-h767d61c_2 2025-05-07T19:44:03.5252980Z pycparser conda-forge/noarch::pycparser-2.22-pyh29332c3_1 2025-05-07T19:44:03.5253497Z pyopenssl conda-forge/noarch::pyopenssl-25.0.0-pyhd8ed1ab_0 2025-05-07T19:44:03.5254055Z typing-extensions conda-forge/noarch::typing-extensions-4.13.2-h0e9735f_0 2025-05-07T19:44:03.5254704Z typing_extensions conda-forge/noarch::typing_extensions-4.13.2-pyh29332c3_0 2025-05-07T19:44:03.5255235Z 2025-05-07T19:44:03.5255395Z The following packages will be UPDATED: 2025-05-07T19:44:03.5255617Z 2025-05-07T19:44:03.5256041Z ca-certificates pkgs/main/linux-64::ca-certificates-2~ --> conda-forge/noarch::ca-certificates-2025.4.26-hbd8a1cb_0 2025-05-07T19:44:03.5256903Z libgcc-ng pkgs/main::libgcc-ng-11.2.0-h1234567_1 --> conda-forge::libgcc-ng-15.1.0-h69a702a_2 2025-05-07T19:44:03.5257634Z libgomp pkgs/main::libgomp-11.2.0-h1234567_1 --> conda-forge::libgomp-15.1.0-h767d61c_2 2025-05-07T19:44:03.5258974Z openssl pkgs/main::openssl-3.0.16-h5eee18b_0 --> conda-forge::openssl-3.5.0-h7b32b05_1 2025-05-07T19:44:03.5259383Z 2025-05-07T19:44:03.5259415Z 2025-05-07T19:44:03.5259418Z 2025-05-07T19:44:03.5259706Z Downloading and Extracting Packages: ...working... 2025-05-07T19:44:03.5260121Z openssl-3.5.0 | 3.0 MB | | 0% 2025-05-07T19:44:03.5260409Z 2025-05-07T19:44:03.5260745Z cryptography-44.0.3 | 1.5 MB | | 0%  2025-05-07T19:44:03.5261026Z 2025-05-07T19:44:03.5261030Z 2025-05-07T19:44:03.5261304Z libgcc-15.1.0 | 810 KB | | 0%  2025-05-07T19:44:03.5261724Z 2025-05-07T19:44:03.5261732Z 2025-05-07T19:44:03.5261740Z 2025-05-07T19:44:03.5262221Z libgomp-15.1.0 | 442 KB | | 0%  2025-05-07T19:44:03.5262581Z 2025-05-07T19:44:03.5262585Z 2025-05-07T19:44:03.5262588Z 2025-05-07T19:44:03.5262592Z 2025-05-07T19:44:03.5277698Z cffi-1.17.1 | 289 KB | | 0%  2025-05-07T19:44:03.5279288Z 2025-05-07T19:44:03.5279314Z 2025-05-07T19:44:03.5279337Z 2025-05-07T19:44:03.5279352Z 2025-05-07T19:44:03.5279369Z 2025-05-07T19:44:03.5281303Z pyopenssl-25.0.0 | 120 KB | | 0%  2025-05-07T19:44:03.5283015Z 2025-05-07T19:44:03.5283027Z 2025-05-07T19:44:03.5283037Z 2025-05-07T19:44:03.5283047Z 2025-05-07T19:44:03.5283057Z 2025-05-07T19:44:03.5283067Z 2025-05-07T19:44:03.5283830Z pycparser-2.22 | 108 KB | | 0%  2025-05-07T19:44:03.5284301Z 2025-05-07T19:44:03.5284305Z 2025-05-07T19:44:03.5284308Z 2025-05-07T19:44:03.5284312Z 2025-05-07T19:44:03.5284315Z 2025-05-07T19:44:03.5284319Z 2025-05-07T19:44:03.5284725Z 2025-05-07T19:44:03.5286517Z typing-extensions-4. | 88 KB | | 0%  2025-05-07T19:44:03.5286990Z 2025-05-07T19:44:03.5286994Z 2025-05-07T19:44:03.5286998Z 2025-05-07T19:44:03.5287002Z 2025-05-07T19:44:03.5287005Z 2025-05-07T19:44:03.5287009Z 2025-05-07T19:44:03.5287012Z 2025-05-07T19:44:03.5287015Z 2025-05-07T19:44:03.5287304Z typing_extensions-4. | 51 KB | | 0%  2025-05-07T19:44:03.5287895Z 2025-05-07T19:44:03.5288046Z 2025-05-07T19:44:03.5288054Z 2025-05-07T19:44:03.5288062Z 2025-05-07T19:44:03.5288067Z 2025-05-07T19:44:03.5288076Z 2025-05-07T19:44:03.5288083Z 2025-05-07T19:44:03.5288092Z 2025-05-07T19:44:03.5288113Z 2025-05-07T19:44:03.5849494Z libgcc-ng-15.1.0 | 34 KB | | 0%  2025-05-07T19:44:03.5850812Z 2025-05-07T19:44:03.5850826Z 2025-05-07T19:44:03.5850836Z 2025-05-07T19:44:03.5850846Z 2025-05-07T19:44:03.6055926Z cffi-1.17.1 | 289 KB | ########## | 100%  2025-05-07T19:44:03.6056223Z 2025-05-07T19:44:03.6056248Z 2025-05-07T19:44:03.6056252Z 2025-05-07T19:44:03.6179389Z libgomp-15.1.0 | 442 KB | ########## | 100%  2025-05-07T19:44:03.6180894Z 2025-05-07T19:44:03.6239068Z cryptography-44.0.3 | 1.5 MB | ########## | 100%  2025-05-07T19:44:03.6239593Z 2025-05-07T19:44:03.6239610Z 2025-05-07T19:44:03.6239613Z 2025-05-07T19:44:03.6239617Z 2025-05-07T19:44:03.6239620Z 2025-05-07T19:44:03.6267058Z pyopenssl-25.0.0 | 120 KB | ########## | 100%  2025-05-07T19:44:03.6268551Z 2025-05-07T19:44:03.6268567Z 2025-05-07T19:44:03.6379723Z libgcc-15.1.0 | 810 KB | #########2 | 93%  2025-05-07T19:44:03.6380144Z 2025-05-07T19:44:03.6380149Z 2025-05-07T19:44:03.6459772Z libgcc-15.1.0 | 810 KB | ########## | 100%  2025-05-07T19:44:03.6461392Z 2025-05-07T19:44:03.6461416Z 2025-05-07T19:44:03.6461439Z 2025-05-07T19:44:03.6461453Z 2025-05-07T19:44:03.6462276Z cffi-1.17.1 | 289 KB | ########## | 100%  2025-05-07T19:44:03.6463065Z 2025-05-07T19:44:03.6463077Z 2025-05-07T19:44:03.6463088Z 2025-05-07T19:44:03.6463098Z 2025-05-07T19:44:03.6574438Z cffi-1.17.1 | 289 KB | ########## | 100%  2025-05-07T19:44:03.6575790Z 2025-05-07T19:44:03.6575805Z 2025-05-07T19:44:03.6575816Z 2025-05-07T19:44:03.6575826Z 2025-05-07T19:44:03.6575837Z 2025-05-07T19:44:03.6575848Z 2025-05-07T19:44:03.6578175Z pycparser-2.22 | 108 KB | #4 | 15%  2025-05-07T19:44:03.6579604Z openssl-3.5.0 | 3.0 MB | ########## | 100% 2025-05-07T19:44:03.6597700Z openssl-3.5.0 | 3.0 MB | ########## | 100% 2025-05-07T19:44:03.6599231Z 2025-05-07T19:44:03.6599252Z 2025-05-07T19:44:03.6599269Z 2025-05-07T19:44:03.6599285Z 2025-05-07T19:44:03.6599302Z 2025-05-07T19:44:03.6599320Z 2025-05-07T19:44:03.6621978Z pycparser-2.22 | 108 KB | ########## | 100%  2025-05-07T19:44:03.6622549Z 2025-05-07T19:44:03.6622554Z 2025-05-07T19:44:03.6622558Z 2025-05-07T19:44:03.6622835Z libgomp-15.1.0 | 442 KB | ########## | 100%  2025-05-07T19:44:03.6623110Z 2025-05-07T19:44:03.6623114Z 2025-05-07T19:44:03.6623138Z 2025-05-07T19:44:03.6689818Z libgomp-15.1.0 | 442 KB | ########## | 100%  2025-05-07T19:44:03.6691136Z 2025-05-07T19:44:03.6691149Z 2025-05-07T19:44:03.6691159Z 2025-05-07T19:44:03.6691170Z 2025-05-07T19:44:03.6691180Z 2025-05-07T19:44:03.6691220Z 2025-05-07T19:44:03.6691231Z 2025-05-07T19:44:03.6722856Z typing-extensions-4. | 88 KB | #8 | 18%  2025-05-07T19:44:03.6723423Z 2025-05-07T19:44:03.6723428Z 2025-05-07T19:44:03.6723432Z 2025-05-07T19:44:03.6723435Z 2025-05-07T19:44:03.6723439Z 2025-05-07T19:44:03.6723443Z 2025-05-07T19:44:03.6723446Z 2025-05-07T19:44:03.6777769Z typing-extensions-4. | 88 KB | ########## | 100%  2025-05-07T19:44:03.6779089Z 2025-05-07T19:44:03.6779103Z 2025-05-07T19:44:03.6779113Z 2025-05-07T19:44:03.6779123Z 2025-05-07T19:44:03.6779910Z 2025-05-07T19:44:03.6780782Z pyopenssl-25.0.0 | 120 KB | ########## | 100%  2025-05-07T19:44:03.6781648Z 2025-05-07T19:44:03.6781659Z 2025-05-07T19:44:03.6781670Z 2025-05-07T19:44:03.6781680Z 2025-05-07T19:44:03.6781691Z 2025-05-07T19:44:03.6850341Z pyopenssl-25.0.0 | 120 KB | ########## | 100%  2025-05-07T19:44:03.6851999Z 2025-05-07T19:44:03.6852012Z 2025-05-07T19:44:03.6852430Z 2025-05-07T19:44:03.6852442Z 2025-05-07T19:44:03.6852452Z 2025-05-07T19:44:03.6852462Z 2025-05-07T19:44:03.6852473Z 2025-05-07T19:44:03.6852483Z 2025-05-07T19:44:03.6852493Z 2025-05-07T19:44:03.6864533Z libgcc-ng-15.1.0 | 34 KB | ####7 | 47%  2025-05-07T19:44:03.6865785Z 2025-05-07T19:44:03.6865799Z 2025-05-07T19:44:03.6865809Z 2025-05-07T19:44:03.6865819Z 2025-05-07T19:44:03.6865830Z 2025-05-07T19:44:03.6865840Z 2025-05-07T19:44:03.6865851Z 2025-05-07T19:44:03.6865861Z 2025-05-07T19:44:03.6865871Z 2025-05-07T19:44:03.6895488Z libgcc-ng-15.1.0 | 34 KB | ########## | 100%  2025-05-07T19:44:03.6896946Z 2025-05-07T19:44:03.6896960Z 2025-05-07T19:44:03.6896970Z 2025-05-07T19:44:03.6896981Z 2025-05-07T19:44:03.6896991Z 2025-05-07T19:44:03.6897001Z 2025-05-07T19:44:03.6897011Z 2025-05-07T19:44:03.6897021Z 2025-05-07T19:44:03.6919348Z typing_extensions-4. | 51 KB | ###1 | 31%  2025-05-07T19:44:03.6920445Z 2025-05-07T19:44:03.6920450Z 2025-05-07T19:44:03.6920454Z 2025-05-07T19:44:03.6920458Z 2025-05-07T19:44:03.6920461Z 2025-05-07T19:44:03.6920464Z 2025-05-07T19:44:03.6920468Z 2025-05-07T19:44:03.6920471Z 2025-05-07T19:44:03.7238115Z typing_extensions-4. | 51 KB | ########## | 100%  2025-05-07T19:44:03.7239337Z 2025-05-07T19:44:03.7239351Z 2025-05-07T19:44:03.7887657Z libgcc-15.1.0 | 810 KB | ########## | 100%  2025-05-07T19:44:03.7889061Z 2025-05-07T19:44:03.7889075Z 2025-05-07T19:44:03.7889118Z 2025-05-07T19:44:03.7889129Z 2025-05-07T19:44:03.7889139Z 2025-05-07T19:44:03.7889150Z 2025-05-07T19:44:03.7889948Z pycparser-2.22 | 108 KB | ########## | 100%  2025-05-07T19:44:03.7890809Z 2025-05-07T19:44:03.7890846Z 2025-05-07T19:44:03.7890856Z 2025-05-07T19:44:03.7890866Z 2025-05-07T19:44:03.7890876Z 2025-05-07T19:44:03.7890886Z 2025-05-07T19:44:03.8009751Z pycparser-2.22 | 108 KB | ########## | 100%  2025-05-07T19:44:03.8010435Z 2025-05-07T19:44:03.8010445Z 2025-05-07T19:44:03.8010454Z 2025-05-07T19:44:03.8010484Z 2025-05-07T19:44:03.8010489Z 2025-05-07T19:44:03.8010493Z 2025-05-07T19:44:03.8010498Z 2025-05-07T19:44:03.8010862Z typing-extensions-4. | 88 KB | ########## | 100%  2025-05-07T19:44:03.8011190Z 2025-05-07T19:44:03.8011193Z 2025-05-07T19:44:03.8011197Z 2025-05-07T19:44:03.8011201Z 2025-05-07T19:44:03.8011204Z 2025-05-07T19:44:03.8011208Z 2025-05-07T19:44:03.8011228Z 2025-05-07T19:44:03.8327983Z typing-extensions-4. | 88 KB | ########## | 100%  2025-05-07T19:44:03.8329732Z 2025-05-07T19:44:03.8329759Z 2025-05-07T19:44:03.8329782Z 2025-05-07T19:44:03.8329800Z 2025-05-07T19:44:03.8329821Z 2025-05-07T19:44:03.8329843Z 2025-05-07T19:44:03.8329896Z 2025-05-07T19:44:03.8329920Z 2025-05-07T19:44:03.8329937Z 2025-05-07T19:44:03.8331025Z libgcc-ng-15.1.0 | 34 KB | ########## | 100%  2025-05-07T19:44:03.8331341Z 2025-05-07T19:44:03.8331345Z 2025-05-07T19:44:03.8331348Z 2025-05-07T19:44:03.8331352Z 2025-05-07T19:44:03.8331355Z 2025-05-07T19:44:03.8331358Z 2025-05-07T19:44:03.8331362Z 2025-05-07T19:44:03.8331382Z 2025-05-07T19:44:03.8331396Z 2025-05-07T19:44:03.8460114Z libgcc-ng-15.1.0 | 34 KB | ########## | 100%  2025-05-07T19:44:03.8463262Z openssl-3.5.0 | 3.0 MB | ########## | 100% 2025-05-07T19:44:03.8463805Z 2025-05-07T19:44:03.8463839Z 2025-05-07T19:44:03.8463847Z 2025-05-07T19:44:03.8464119Z 2025-05-07T19:44:03.8464125Z 2025-05-07T19:44:03.8464129Z 2025-05-07T19:44:03.8465580Z 2025-05-07T19:44:03.8465673Z 2025-05-07T19:44:03.8466314Z typing_extensions-4. | 51 KB | ########## | 100%  2025-05-07T19:44:03.8466892Z 2025-05-07T19:44:03.8466907Z 2025-05-07T19:44:03.8466911Z 2025-05-07T19:44:03.8466915Z 2025-05-07T19:44:03.8466918Z 2025-05-07T19:44:03.8467089Z 2025-05-07T19:44:03.8467093Z 2025-05-07T19:44:03.8467096Z 2025-05-07T19:44:03.8533380Z typing_extensions-4. | 51 KB | ########## | 100%  2025-05-07T19:44:03.8533899Z 2025-05-07T19:44:03.8534294Z cryptography-44.0.3 | 1.5 MB | ########## | 100%  2025-05-07T19:44:03.8534760Z 2025-05-07T19:44:03.8540391Z cryptography-44.0.3 | 1.5 MB | ########## | 100%  2025-05-07T19:44:03.8542595Z 2025-05-07T19:44:03.8543480Z 2025-05-07T19:44:03.8543978Z  2025-05-07T19:44:03.8544643Z 2025-05-07T19:44:03.8544656Z 2025-05-07T19:44:03.8545144Z  2025-05-07T19:44:03.8545775Z 2025-05-07T19:44:03.8545787Z 2025-05-07T19:44:03.8545797Z 2025-05-07T19:44:03.8546320Z  2025-05-07T19:44:03.8546948Z 2025-05-07T19:44:03.8546958Z 2025-05-07T19:44:03.8546969Z 2025-05-07T19:44:03.8546997Z 2025-05-07T19:44:03.8547505Z  2025-05-07T19:44:03.8548169Z 2025-05-07T19:44:03.8548179Z 2025-05-07T19:44:03.8548189Z 2025-05-07T19:44:03.8548200Z 2025-05-07T19:44:03.8548210Z 2025-05-07T19:44:03.8548727Z  2025-05-07T19:44:03.8549374Z 2025-05-07T19:44:03.8549405Z 2025-05-07T19:44:03.8549415Z 2025-05-07T19:44:03.8549425Z 2025-05-07T19:44:03.8549435Z 2025-05-07T19:44:03.8549445Z 2025-05-07T19:44:03.8549987Z  2025-05-07T19:44:03.8550646Z 2025-05-07T19:44:03.8550656Z 2025-05-07T19:44:03.8550666Z 2025-05-07T19:44:03.8550677Z 2025-05-07T19:44:03.8550687Z 2025-05-07T19:44:03.8550717Z 2025-05-07T19:44:03.8550726Z 2025-05-07T19:44:03.8551266Z  2025-05-07T19:44:03.8551936Z 2025-05-07T19:44:03.8551959Z 2025-05-07T19:44:03.8551969Z 2025-05-07T19:44:03.8551979Z 2025-05-07T19:44:03.8551989Z 2025-05-07T19:44:03.8551999Z 2025-05-07T19:44:03.8552009Z 2025-05-07T19:44:03.8552019Z 2025-05-07T19:44:03.8552579Z  2025-05-07T19:44:03.8553254Z 2025-05-07T19:44:03.8553265Z 2025-05-07T19:44:03.8553276Z 2025-05-07T19:44:03.8553286Z 2025-05-07T19:44:03.8553296Z 2025-05-07T19:44:03.8553305Z 2025-05-07T19:44:03.8553315Z 2025-05-07T19:44:03.8553325Z 2025-05-07T19:44:03.8553335Z 2025-05-07T19:44:03.8553944Z  done 2025-05-07T19:44:03.9550946Z Preparing transaction: \ done 2025-05-07T19:44:04.0558770Z Verifying transaction: / done 2025-05-07T19:44:05.4590683Z Executing transaction: \ | / - \ | / - \ | / - \ | done 2025-05-07T19:44:05.5563787Z [SETUP] Testing pyOpenSSL import ... 2025-05-07T19:44:07.2421683Z [CHECK] Python (sub-)package 'OpenSSL' found ... 2025-05-07T19:44:07.2440478Z [SETUP] Installing libxcrypt ... 2025-05-07T19:44:07.2465426Z [EXEC] [ATTEMPT 0/3] + conda install -n build_binary -c conda-forge --override-channels -y libxcrypt 2025-05-07T19:44:07.9128038Z Channels: 2025-05-07T19:44:07.9129744Z - conda-forge 2025-05-07T19:44:07.9130483Z Platform: linux-64 2025-05-07T19:44:10.9540902Z Collecting package metadata (repodata.json): - \ | / done 2025-05-07T19:44:11.3762742Z Solving environment: \ done 2025-05-07T19:44:11.4232418Z 2025-05-07T19:44:11.4232968Z ## Package Plan ## 2025-05-07T19:44:11.4233845Z 2025-05-07T19:44:11.4234156Z environment location: /github/home/miniconda/envs/build_binary 2025-05-07T19:44:11.4234494Z 2025-05-07T19:44:11.4234604Z added / updated specs: 2025-05-07T19:44:11.4234886Z - libxcrypt 2025-05-07T19:44:11.4235143Z 2025-05-07T19:44:11.4235147Z 2025-05-07T19:44:11.4235278Z The following packages will be downloaded: 2025-05-07T19:44:11.4235525Z 2025-05-07T19:44:11.4235770Z package | build 2025-05-07T19:44:11.4236131Z ---------------------------|----------------- 2025-05-07T19:44:11.4236526Z libxcrypt-4.4.36 | hd590300_1 98 KB conda-forge 2025-05-07T19:44:11.4236976Z ------------------------------------------------------------ 2025-05-07T19:44:11.4237334Z Total: 98 KB 2025-05-07T19:44:11.4237577Z 2025-05-07T19:44:11.4237712Z The following NEW packages will be INSTALLED: 2025-05-07T19:44:11.4237948Z 2025-05-07T19:44:11.4238207Z libxcrypt conda-forge/linux-64::libxcrypt-4.4.36-hd590300_1 2025-05-07T19:44:11.4238515Z 2025-05-07T19:44:11.4238519Z 2025-05-07T19:44:11.4238522Z 2025-05-07T19:44:11.4238672Z Downloading and Extracting Packages: ...working... 2025-05-07T19:44:11.5530814Z libxcrypt-4.4.36 | 98 KB | | 0% 2025-05-07T19:44:11.5558105Z libxcrypt-4.4.36 | 98 KB | #6 | 16% 2025-05-07T19:44:11.5660980Z libxcrypt-4.4.36 | 98 KB | ########## | 100% 2025-05-07T19:44:11.5661451Z libxcrypt-4.4.36 | 98 KB | ########## | 100% 2025-05-07T19:44:11.5661808Z 2025-05-07T19:44:11.5662122Z done 2025-05-07T19:44:11.6669804Z Preparing transaction: / done 2025-05-07T19:44:11.7677989Z Verifying transaction: \ done 2025-05-07T19:44:11.8687242Z Executing transaction: / done 2025-05-07T19:44:15.1570344Z [SETUP] Copying over ... 2025-05-07T19:44:15.1572577Z + cp /github/home/miniconda/envs/build_binary/include/crypt.h /github/home/miniconda/envs/build_binary/include/python3.13/crypt.h 2025-05-07T19:44:15.1574380Z 2025-05-07T19:44:15.1617924Z 2025-05-07T19:44:16.7429721Z [SETUP] Installed Python version: Python 3.13.2 2025-05-07T19:44:16.7431060Z [SETUP] Successfully created Conda environment: build_binary 2025-05-07T19:44:16.7501882Z ##[group]Run . $PRELUDE; install_cxx_compiler $BUILD_ENV gcc 2025-05-07T19:44:16.7502443Z . $PRELUDE; install_cxx_compiler $BUILD_ENV gcc 2025-05-07T19:44:16.7503109Z shell: bash --noprofile --norc -e -o pipefail {0} 2025-05-07T19:44:16.7503505Z env: 2025-05-07T19:44:16.7503760Z PRELUDE: .github/scripts/setup_env.bash 2025-05-07T19:44:16.7504125Z BUILD_ENV: build_binary 2025-05-07T19:44:16.7504404Z BUILD_TARGET: genai 2025-05-07T19:44:16.7504693Z BUILD_VARIANT: cuda 2025-05-07T19:44:16.7504961Z BUILD_CUDA_VERSION: 12.8.0 2025-05-07T19:44:16.7505270Z ##[endgroup] 2025-05-07T19:44:17.2011524Z ################################################################################ 2025-05-07T19:44:17.2011919Z # Install C/C++ Compilers 2025-05-07T19:44:17.2012190Z # 2025-05-07T19:44:17.2042635Z # [2025-05-07T19:44:17.203Z] + install_cxx_compiler build_binary gcc 2025-05-07T19:44:17.2043984Z ################################################################################ 2025-05-07T19:44:17.2044404Z 2025-05-07T19:44:17.2061687Z [EXEC] [ATTEMPT 0/3] + wget -q --timeout 1 pypi.org -O /dev/null 2025-05-07T19:44:17.2922797Z [CHECK] Network does not appear to be blocked. 2025-05-07T19:44:17.2939633Z [INSTALL] Installing GLIBC (architecture = 64) ... 2025-05-07T19:44:17.2966461Z [EXEC] [ATTEMPT 0/3] + conda install -n build_binary -c conda-forge --override-channels -y sysroot_linux-64=2.17 2025-05-07T19:44:17.9624083Z Channels: 2025-05-07T19:44:17.9624775Z - conda-forge 2025-05-07T19:44:17.9625068Z Platform: linux-64 2025-05-07T19:44:21.0140328Z Collecting package metadata (repodata.json): - \ | / done 2025-05-07T19:44:21.4392901Z Solving environment: \ done 2025-05-07T19:44:21.4872347Z 2025-05-07T19:44:21.4872762Z ## Package Plan ## 2025-05-07T19:44:21.4873018Z 2025-05-07T19:44:21.4873533Z environment location: /github/home/miniconda/envs/build_binary 2025-05-07T19:44:21.4874087Z 2025-05-07T19:44:21.4874314Z added / updated specs: 2025-05-07T19:44:21.4874882Z - sysroot_linux-64=2.17 2025-05-07T19:44:21.4875074Z 2025-05-07T19:44:21.4875079Z 2025-05-07T19:44:21.4875210Z The following packages will be downloaded: 2025-05-07T19:44:21.4875737Z 2025-05-07T19:44:21.4875862Z package | build 2025-05-07T19:44:21.4876232Z ---------------------------|----------------- 2025-05-07T19:44:21.4876684Z kernel-headers_linux-64-3.10.0| he073ed8_18 921 KB conda-forge 2025-05-07T19:44:21.4877234Z sysroot_linux-64-2.17 | h0157908_18 14.5 MB conda-forge 2025-05-07T19:44:21.4877673Z ------------------------------------------------------------ 2025-05-07T19:44:21.4878058Z Total: 15.4 MB 2025-05-07T19:44:21.4878283Z 2025-05-07T19:44:21.4878419Z The following NEW packages will be INSTALLED: 2025-05-07T19:44:21.4878677Z 2025-05-07T19:44:21.4878984Z kernel-headers_li~ conda-forge/noarch::kernel-headers_linux-64-3.10.0-he073ed8_18 2025-05-07T19:44:21.4879763Z sysroot_linux-64 conda-forge/noarch::sysroot_linux-64-2.17-h0157908_18 2025-05-07T19:44:21.4880100Z 2025-05-07T19:44:21.4880110Z 2025-05-07T19:44:21.4880113Z 2025-05-07T19:44:21.4880386Z Downloading and Extracting Packages: ...working... 2025-05-07T19:44:21.4880791Z sysroot_linux-64-2.1 | 14.5 MB | | 0% 2025-05-07T19:44:21.4881031Z 2025-05-07T19:44:21.6822253Z kernel-headers_linux | 921 KB | | 0%  2025-05-07T19:44:21.6908732Z sysroot_linux-64-2.1 | 14.5 MB | | 0% 2025-05-07T19:44:21.6909069Z 2025-05-07T19:44:21.7029057Z kernel-headers_linux | 921 KB | 1 | 2%  2025-05-07T19:44:21.7029627Z 2025-05-07T19:44:21.8431994Z kernel-headers_linux | 921 KB | ########## | 100%  2025-05-07T19:44:21.8432499Z sysroot_linux-64-2.1 | 14.5 MB | ########## | 100% 2025-05-07T19:44:21.9355215Z sysroot_linux-64-2.1 | 14.5 MB | ########## | 100% 2025-05-07T19:44:21.9356699Z 2025-05-07T19:44:21.9358809Z kernel-headers_linux | 921 KB | ########## | 100%  2025-05-07T19:44:21.9360122Z 2025-05-07T19:44:22.3269650Z kernel-headers_linux | 921 KB | ########## | 100%  2025-05-07T19:44:22.3271863Z sysroot_linux-64-2.1 | 14.5 MB | ########## | 100% 2025-05-07T19:44:22.3272287Z 2025-05-07T19:44:22.3272510Z 2025-05-07T19:44:22.3273137Z  done 2025-05-07T19:44:22.4283218Z Preparing transaction: / done 2025-05-07T19:44:22.6295760Z Verifying transaction: \ | done 2025-05-07T19:44:22.7303825Z Executing transaction: - done 2025-05-07T19:44:22.8174556Z [CHECK] LD_LIBRARY_PATH = 2025-05-07T19:44:22.8174894Z [CHECK] CONDA_PREFIX is not set. 2025-05-07T19:44:24.4538206Z [CHECK] libstdc++.so.6 found in CONDA_PREFIX PATH (symbolic link): /github/home/miniconda/envs/build_binary/lib/libstdc++.so.6 2025-05-07T19:44:24.4561388Z [INSTALL] Installing GCC (11.4.0, 64) through Conda ... 2025-05-07T19:44:24.4590268Z [EXEC] [ATTEMPT 0/3] + conda install -n build_binary -c conda-forge --override-channels -y gxx_linux-64=11.4.0 2025-05-07T19:44:25.1611893Z Channels: 2025-05-07T19:44:25.1612578Z - conda-forge 2025-05-07T19:44:25.1613247Z Platform: linux-64 2025-05-07T19:44:28.2046424Z Collecting package metadata (repodata.json): - \ | / done 2025-05-07T19:44:29.3316192Z Solving environment: \ | / - done 2025-05-07T19:44:29.3803128Z 2025-05-07T19:44:29.3804277Z ## Package Plan ## 2025-05-07T19:44:29.3804766Z 2025-05-07T19:44:29.3805508Z environment location: /github/home/miniconda/envs/build_binary 2025-05-07T19:44:29.3805834Z 2025-05-07T19:44:29.3805940Z added / updated specs: 2025-05-07T19:44:29.3806235Z - gxx_linux-64=11.4.0 2025-05-07T19:44:29.3806403Z 2025-05-07T19:44:29.3806407Z 2025-05-07T19:44:29.3806541Z The following packages will be downloaded: 2025-05-07T19:44:29.3806796Z 2025-05-07T19:44:29.3806920Z package | build 2025-05-07T19:44:29.3807305Z ---------------------------|----------------- 2025-05-07T19:44:29.3807740Z binutils_impl_linux-64-2.40| ha1999f0_7 6.0 MB conda-forge 2025-05-07T19:44:29.3808579Z binutils_linux-64-2.40 | hb3c18ed_4 28 KB conda-forge 2025-05-07T19:44:29.3809076Z gcc_impl_linux-64-11.4.0 | h00c12a0_13 53.0 MB conda-forge 2025-05-07T19:44:29.3809572Z gcc_linux-64-11.4.0 | ha077dfb_4 31 KB conda-forge 2025-05-07T19:44:29.3810065Z gxx_impl_linux-64-11.4.0 | h634f3ee_13 11.2 MB conda-forge 2025-05-07T19:44:29.3810534Z gxx_linux-64-11.4.0 | h35bfe5d_4 29 KB conda-forge 2025-05-07T19:44:29.3811010Z ld_impl_linux-64-2.40 | hf3520f5_7 691 KB conda-forge 2025-05-07T19:44:29.3811513Z libgcc-devel_linux-64-11.4.0| h8f596e0_113 2.3 MB conda-forge 2025-05-07T19:44:29.3812041Z libsanitizer-11.4.0 | h5763a12_13 3.5 MB conda-forge 2025-05-07T19:44:29.3812524Z libstdcxx-15.1.0 | h8f9b012_2 3.7 MB conda-forge 2025-05-07T19:44:29.3813053Z libstdcxx-devel_linux-64-11.4.0| h8f596e0_113 11.1 MB conda-forge 2025-05-07T19:44:29.3813593Z libstdcxx-ng-15.1.0 | h4852527_2 34 KB conda-forge 2025-05-07T19:44:29.3814027Z ------------------------------------------------------------ 2025-05-07T19:44:29.3814409Z Total: 91.6 MB 2025-05-07T19:44:29.3814635Z 2025-05-07T19:44:29.3814775Z The following NEW packages will be INSTALLED: 2025-05-07T19:44:29.3815039Z 2025-05-07T19:44:29.3815340Z binutils_impl_lin~ conda-forge/linux-64::binutils_impl_linux-64-2.40-ha1999f0_7 2025-05-07T19:44:29.3815979Z binutils_linux-64 conda-forge/linux-64::binutils_linux-64-2.40-hb3c18ed_4 2025-05-07T19:44:29.3816567Z gcc_impl_linux-64 conda-forge/linux-64::gcc_impl_linux-64-11.4.0-h00c12a0_13 2025-05-07T19:44:29.3817333Z gcc_linux-64 conda-forge/linux-64::gcc_linux-64-11.4.0-ha077dfb_4 2025-05-07T19:44:29.3817891Z gxx_impl_linux-64 conda-forge/linux-64::gxx_impl_linux-64-11.4.0-h634f3ee_13 2025-05-07T19:44:29.3818460Z gxx_linux-64 conda-forge/linux-64::gxx_linux-64-11.4.0-h35bfe5d_4 2025-05-07T19:44:29.3819054Z libgcc-devel_linu~ conda-forge/noarch::libgcc-devel_linux-64-11.4.0-h8f596e0_113 2025-05-07T19:44:29.3819812Z libsanitizer conda-forge/linux-64::libsanitizer-11.4.0-h5763a12_13 2025-05-07T19:44:29.3820359Z libstdcxx conda-forge/linux-64::libstdcxx-15.1.0-h8f9b012_2 2025-05-07T19:44:29.3820974Z libstdcxx-devel_l~ conda-forge/noarch::libstdcxx-devel_linux-64-11.4.0-h8f596e0_113 2025-05-07T19:44:29.3821379Z 2025-05-07T19:44:29.3821506Z The following packages will be UPDATED: 2025-05-07T19:44:29.3821755Z 2025-05-07T19:44:29.3822102Z ld_impl_linux-64 pkgs/main::ld_impl_linux-64-2.40-h12e~ --> conda-forge::ld_impl_linux-64-2.40-hf3520f5_7 2025-05-07T19:44:29.3822919Z libstdcxx-ng pkgs/main::libstdcxx-ng-11.2.0-h12345~ --> conda-forge::libstdcxx-ng-15.1.0-h4852527_2 2025-05-07T19:44:29.3823375Z 2025-05-07T19:44:29.3823379Z 2025-05-07T19:44:29.3823388Z 2025-05-07T19:44:29.3823544Z Downloading and Extracting Packages: ...working... 2025-05-07T19:44:29.3823967Z gcc_impl_linux-64-11 | 53.0 MB | | 0% 2025-05-07T19:44:29.3824240Z 2025-05-07T19:44:29.3824598Z gxx_impl_linux-64-11 | 11.2 MB | | 0%  2025-05-07T19:44:29.3824857Z 2025-05-07T19:44:29.3824861Z 2025-05-07T19:44:29.3825103Z libstdcxx-devel_linu | 11.1 MB | | 0%  2025-05-07T19:44:29.3825408Z 2025-05-07T19:44:29.3825412Z 2025-05-07T19:44:29.3825415Z 2025-05-07T19:44:29.3831015Z binutils_impl_linux- | 6.0 MB | | 0%  2025-05-07T19:44:29.3831867Z 2025-05-07T19:44:29.3831878Z 2025-05-07T19:44:29.3831889Z 2025-05-07T19:44:29.3831928Z 2025-05-07T19:44:29.3842352Z libstdcxx-15.1.0 | 3.7 MB | | 0%  2025-05-07T19:44:29.3843268Z 2025-05-07T19:44:29.3843312Z 2025-05-07T19:44:29.3843324Z 2025-05-07T19:44:29.3843333Z 2025-05-07T19:44:29.3843343Z 2025-05-07T19:44:29.3870537Z libsanitizer-11.4.0 | 3.5 MB | | 0%  2025-05-07T19:44:29.3871086Z 2025-05-07T19:44:29.3871091Z 2025-05-07T19:44:29.3871095Z 2025-05-07T19:44:29.3871099Z 2025-05-07T19:44:29.3871102Z 2025-05-07T19:44:29.3871106Z 2025-05-07T19:44:29.3871423Z libgcc-devel_linux-6 | 2.3 MB | | 0%  2025-05-07T19:44:29.3871737Z 2025-05-07T19:44:29.3871741Z 2025-05-07T19:44:29.3871744Z 2025-05-07T19:44:29.3871748Z 2025-05-07T19:44:29.3871751Z 2025-05-07T19:44:29.3871754Z 2025-05-07T19:44:29.3871758Z 2025-05-07T19:44:29.3872026Z ld_impl_linux-64-2.4 | 691 KB | | 0%  2025-05-07T19:44:29.3872351Z 2025-05-07T19:44:29.3872355Z 2025-05-07T19:44:29.3872359Z 2025-05-07T19:44:29.3872362Z 2025-05-07T19:44:29.3872366Z 2025-05-07T19:44:29.3872369Z 2025-05-07T19:44:29.3872373Z 2025-05-07T19:44:29.3872393Z 2025-05-07T19:44:29.3872669Z libstdcxx-ng-15.1.0 | 34 KB | | 0%  2025-05-07T19:44:29.3872997Z 2025-05-07T19:44:29.3873000Z 2025-05-07T19:44:29.3873004Z 2025-05-07T19:44:29.3873007Z 2025-05-07T19:44:29.3873011Z 2025-05-07T19:44:29.3873014Z 2025-05-07T19:44:29.3873017Z 2025-05-07T19:44:29.3873021Z 2025-05-07T19:44:29.3873042Z 2025-05-07T19:44:29.3873306Z gcc_linux-64-11.4.0 | 31 KB | | 0%  2025-05-07T19:44:29.3873623Z 2025-05-07T19:44:29.3873626Z 2025-05-07T19:44:29.3873630Z 2025-05-07T19:44:29.3873633Z 2025-05-07T19:44:29.3873637Z 2025-05-07T19:44:29.3873640Z 2025-05-07T19:44:29.3873643Z 2025-05-07T19:44:29.3873647Z 2025-05-07T19:44:29.3873650Z 2025-05-07T19:44:29.3873654Z 2025-05-07T19:44:29.3873922Z gxx_linux-64-11.4.0 | 29 KB | | 0%  2025-05-07T19:44:29.3874246Z 2025-05-07T19:44:29.3874250Z 2025-05-07T19:44:29.3874253Z 2025-05-07T19:44:29.3874433Z 2025-05-07T19:44:29.3874438Z 2025-05-07T19:44:29.3874442Z 2025-05-07T19:44:29.3874446Z 2025-05-07T19:44:29.3874449Z 2025-05-07T19:44:29.3874457Z 2025-05-07T19:44:29.3874461Z 2025-05-07T19:44:29.3874478Z 2025-05-07T19:44:29.7968150Z binutils_linux-64-2. | 28 KB | | 0%  2025-05-07T19:44:29.7969166Z 2025-05-07T19:44:29.7969180Z 2025-05-07T19:44:29.7969190Z 2025-05-07T19:44:29.7969201Z 2025-05-07T19:44:29.8002032Z libstdcxx-15.1.0 | 3.7 MB | | 0%  2025-05-07T19:44:29.8002393Z 2025-05-07T19:44:29.8002398Z 2025-05-07T19:44:29.8002402Z 2025-05-07T19:44:29.8017665Z binutils_impl_linux- | 6.0 MB | | 0%  2025-05-07T19:44:29.8018606Z 2025-05-07T19:44:29.8112190Z gxx_impl_linux-64-11 | 11.2 MB | | 0%  2025-05-07T19:44:29.8112517Z 2025-05-07T19:44:29.8112532Z 2025-05-07T19:44:29.8181647Z libstdcxx-devel_linu | 11.1 MB | | 0%  2025-05-07T19:44:29.9001724Z gcc_impl_linux-64-11 | 53.0 MB | | 0% 2025-05-07T19:44:29.9002139Z 2025-05-07T19:44:29.9002354Z 2025-05-07T19:44:29.9002386Z 2025-05-07T19:44:29.9016376Z binutils_impl_linux- | 6.0 MB | ########## | 100%  2025-05-07T19:44:29.9016843Z 2025-05-07T19:44:29.9112190Z gxx_impl_linux-64-11 | 11.2 MB | ####8 | 48%  2025-05-07T19:44:29.9112470Z 2025-05-07T19:44:29.9112553Z 2025-05-07T19:44:29.9155947Z libstdcxx-devel_linu | 11.1 MB | ####4 | 44%  2025-05-07T19:44:29.9156244Z 2025-05-07T19:44:29.9156390Z 2025-05-07T19:44:29.9156600Z 2025-05-07T19:44:29.9156659Z 2025-05-07T19:44:29.9157233Z libstdcxx-15.1.0 | 3.7 MB | ########## | 100%  2025-05-07T19:44:29.9157555Z 2025-05-07T19:44:29.9157561Z 2025-05-07T19:44:29.9157565Z 2025-05-07T19:44:29.9157569Z 2025-05-07T19:44:29.9182831Z libstdcxx-15.1.0 | 3.7 MB | ########## | 100%  2025-05-07T19:44:29.9619980Z gcc_impl_linux-64-11 | 53.0 MB | 8 | 9% 2025-05-07T19:44:29.9620327Z 2025-05-07T19:44:29.9620333Z 2025-05-07T19:44:29.9620338Z 2025-05-07T19:44:29.9620343Z 2025-05-07T19:44:29.9620562Z 2025-05-07T19:44:30.0031305Z libsanitizer-11.4.0 | 3.5 MB | | 0%  2025-05-07T19:44:30.0031645Z 2025-05-07T19:44:30.0031650Z 2025-05-07T19:44:30.0031653Z 2025-05-07T19:44:30.0118073Z binutils_impl_linux- | 6.0 MB | ########## | 100%  2025-05-07T19:44:30.0118972Z 2025-05-07T19:44:30.0118986Z 2025-05-07T19:44:30.0186721Z libstdcxx-devel_linu | 11.1 MB | #########1 | 92%  2025-05-07T19:44:30.0461882Z gcc_impl_linux-64-11 | 53.0 MB | #9 | 19% 2025-05-07T19:44:30.0462196Z 2025-05-07T19:44:30.0462201Z 2025-05-07T19:44:30.0462211Z 2025-05-07T19:44:30.0462215Z 2025-05-07T19:44:30.0462218Z 2025-05-07T19:44:30.0462223Z 2025-05-07T19:44:30.0663465Z libgcc-devel_linux-6 | 2.3 MB | | 1%  2025-05-07T19:44:30.0664014Z 2025-05-07T19:44:30.0664020Z 2025-05-07T19:44:30.0664024Z 2025-05-07T19:44:30.0664049Z 2025-05-07T19:44:30.0664053Z 2025-05-07T19:44:30.0664329Z libsanitizer-11.4.0 | 3.5 MB | ########## | 100%  2025-05-07T19:44:30.0664646Z 2025-05-07T19:44:30.0664665Z 2025-05-07T19:44:30.0664668Z 2025-05-07T19:44:30.0664679Z 2025-05-07T19:44:30.0664682Z 2025-05-07T19:44:30.1030174Z libsanitizer-11.4.0 | 3.5 MB | ########## | 100%  2025-05-07T19:44:30.1030504Z 2025-05-07T19:44:30.1030508Z 2025-05-07T19:44:30.1030512Z 2025-05-07T19:44:30.1030515Z 2025-05-07T19:44:30.1030519Z 2025-05-07T19:44:30.1030537Z 2025-05-07T19:44:30.1061989Z libgcc-devel_linux-6 | 2.3 MB | ########## | 100%  2025-05-07T19:44:30.1062733Z 2025-05-07T19:44:30.1062738Z 2025-05-07T19:44:30.1062741Z 2025-05-07T19:44:30.1062745Z 2025-05-07T19:44:30.1062748Z 2025-05-07T19:44:30.1062752Z 2025-05-07T19:44:30.1062755Z 2025-05-07T19:44:30.1184481Z ld_impl_linux-64-2.4 | 691 KB | 2 | 2%  2025-05-07T19:44:30.1236489Z gcc_impl_linux-64-11 | 53.0 MB | ###3 | 34% 2025-05-07T19:44:30.1236956Z 2025-05-07T19:44:30.1236963Z 2025-05-07T19:44:30.1237003Z 2025-05-07T19:44:30.1237007Z 2025-05-07T19:44:30.1237010Z 2025-05-07T19:44:30.1237014Z 2025-05-07T19:44:30.1237017Z 2025-05-07T19:44:30.1361824Z ld_impl_linux-64-2.4 | 691 KB | ########## | 100%  2025-05-07T19:44:30.1362191Z 2025-05-07T19:44:30.1362321Z 2025-05-07T19:44:30.1362329Z 2025-05-07T19:44:30.1362334Z 2025-05-07T19:44:30.1402404Z libstdcxx-15.1.0 | 3.7 MB | ########## | 100%  2025-05-07T19:44:30.1402762Z 2025-05-07T19:44:30.1403136Z gxx_impl_linux-64-11 | 11.2 MB | ########## | 100%  2025-05-07T19:44:30.1403578Z 2025-05-07T19:44:30.1463325Z gxx_impl_linux-64-11 | 11.2 MB | ########## | 100%  2025-05-07T19:44:30.1463586Z 2025-05-07T19:44:30.1463590Z 2025-05-07T19:44:30.1463594Z 2025-05-07T19:44:30.1463598Z 2025-05-07T19:44:30.1463601Z 2025-05-07T19:44:30.1463604Z 2025-05-07T19:44:30.1463622Z 2025-05-07T19:44:30.1463634Z 2025-05-07T19:44:30.1482749Z libstdcxx-ng-15.1.0 | 34 KB | ####7 | 47%  2025-05-07T19:44:30.1483629Z 2025-05-07T19:44:30.1483633Z 2025-05-07T19:44:30.1483636Z 2025-05-07T19:44:30.1483640Z 2025-05-07T19:44:30.1483643Z 2025-05-07T19:44:30.1483646Z 2025-05-07T19:44:30.1483649Z 2025-05-07T19:44:30.1483653Z 2025-05-07T19:44:30.1595692Z libstdcxx-ng-15.1.0 | 34 KB | ########## | 100%  2025-05-07T19:44:30.1596006Z 2025-05-07T19:44:30.1596010Z 2025-05-07T19:44:30.1596013Z 2025-05-07T19:44:30.1596016Z 2025-05-07T19:44:30.1596020Z 2025-05-07T19:44:30.1596023Z 2025-05-07T19:44:30.1596035Z 2025-05-07T19:44:30.1596324Z 2025-05-07T19:44:30.1596333Z 2025-05-07T19:44:30.1604521Z gcc_linux-64-11.4.0 | 31 KB | #####2 | 52%  2025-05-07T19:44:30.1605046Z 2025-05-07T19:44:30.1605050Z 2025-05-07T19:44:30.1605053Z 2025-05-07T19:44:30.1605057Z 2025-05-07T19:44:30.1605060Z 2025-05-07T19:44:30.1605074Z 2025-05-07T19:44:30.1605078Z 2025-05-07T19:44:30.1605081Z 2025-05-07T19:44:30.1605085Z 2025-05-07T19:44:30.1621741Z gcc_linux-64-11.4.0 | 31 KB | ########## | 100%  2025-05-07T19:44:30.1622054Z 2025-05-07T19:44:30.1622069Z 2025-05-07T19:44:30.1887125Z libstdcxx-devel_linu | 11.1 MB | ########## | 100%  2025-05-07T19:44:30.1888043Z 2025-05-07T19:44:30.1888057Z 2025-05-07T19:44:30.1888067Z 2025-05-07T19:44:30.1888078Z 2025-05-07T19:44:30.1888089Z 2025-05-07T19:44:30.1888098Z 2025-05-07T19:44:30.1888108Z 2025-05-07T19:44:30.1888118Z 2025-05-07T19:44:30.1888128Z 2025-05-07T19:44:30.1888138Z 2025-05-07T19:44:30.1894959Z gxx_linux-64-11.4.0 | 29 KB | #####5 | 55%  2025-05-07T19:44:30.1895893Z 2025-05-07T19:44:30.1895905Z 2025-05-07T19:44:30.1895916Z 2025-05-07T19:44:30.1895926Z 2025-05-07T19:44:30.1895936Z 2025-05-07T19:44:30.1895946Z 2025-05-07T19:44:30.1895955Z 2025-05-07T19:44:30.1895995Z 2025-05-07T19:44:30.1896006Z 2025-05-07T19:44:30.1896016Z 2025-05-07T19:44:30.1942628Z gxx_linux-64-11.4.0 | 29 KB | ########## | 100%  2025-05-07T19:44:30.1943385Z 2025-05-07T19:44:30.1943390Z 2025-05-07T19:44:30.1943394Z 2025-05-07T19:44:30.1943397Z 2025-05-07T19:44:30.1943401Z 2025-05-07T19:44:30.1943404Z 2025-05-07T19:44:30.1943407Z 2025-05-07T19:44:30.1943411Z 2025-05-07T19:44:30.1943414Z 2025-05-07T19:44:30.1943417Z 2025-05-07T19:44:30.1943421Z 2025-05-07T19:44:30.1949668Z binutils_linux-64-2. | 28 KB | #####6 | 56%  2025-05-07T19:44:30.1950688Z 2025-05-07T19:44:30.1950702Z 2025-05-07T19:44:30.1950713Z 2025-05-07T19:44:30.1950722Z 2025-05-07T19:44:30.1950732Z 2025-05-07T19:44:30.1950742Z 2025-05-07T19:44:30.1950752Z 2025-05-07T19:44:30.1950762Z 2025-05-07T19:44:30.1950773Z 2025-05-07T19:44:30.1950783Z 2025-05-07T19:44:30.1950793Z 2025-05-07T19:44:30.2187042Z binutils_linux-64-2. | 28 KB | ########## | 100%  2025-05-07T19:44:30.2231039Z gcc_impl_linux-64-11 | 53.0 MB | #####8 | 58% 2025-05-07T19:44:30.2231903Z 2025-05-07T19:44:30.2231918Z 2025-05-07T19:44:30.2231930Z 2025-05-07T19:44:30.2231940Z 2025-05-07T19:44:30.2231950Z 2025-05-07T19:44:30.2231960Z 2025-05-07T19:44:30.2232589Z libgcc-devel_linux-6 | 2.3 MB | ########## | 100%  2025-05-07T19:44:30.2232917Z 2025-05-07T19:44:30.2232920Z 2025-05-07T19:44:30.2232924Z 2025-05-07T19:44:30.2232927Z 2025-05-07T19:44:30.2232931Z 2025-05-07T19:44:30.2232934Z 2025-05-07T19:44:30.2457740Z libgcc-devel_linux-6 | 2.3 MB | ########## | 100%  2025-05-07T19:44:30.2458745Z 2025-05-07T19:44:30.2458785Z 2025-05-07T19:44:30.2458795Z 2025-05-07T19:44:30.2458806Z 2025-05-07T19:44:30.2458816Z 2025-05-07T19:44:30.2492633Z libsanitizer-11.4.0 | 3.5 MB | ########## | 100%  2025-05-07T19:44:30.2492980Z 2025-05-07T19:44:30.2492984Z 2025-05-07T19:44:30.2493003Z 2025-05-07T19:44:30.2493021Z 2025-05-07T19:44:30.2493025Z 2025-05-07T19:44:30.2493028Z 2025-05-07T19:44:30.2493032Z 2025-05-07T19:44:30.2493319Z ld_impl_linux-64-2.4 | 691 KB | ########## | 100%  2025-05-07T19:44:30.2493619Z 2025-05-07T19:44:30.2493623Z 2025-05-07T19:44:30.2493627Z 2025-05-07T19:44:30.2493630Z 2025-05-07T19:44:30.2493634Z 2025-05-07T19:44:30.2493637Z 2025-05-07T19:44:30.2493640Z 2025-05-07T19:44:30.2829642Z ld_impl_linux-64-2.4 | 691 KB | ########## | 100%  2025-05-07T19:44:30.2830612Z 2025-05-07T19:44:30.2830626Z 2025-05-07T19:44:30.2830637Z 2025-05-07T19:44:30.2830647Z 2025-05-07T19:44:30.2830657Z 2025-05-07T19:44:30.2830667Z 2025-05-07T19:44:30.2830699Z 2025-05-07T19:44:30.2830710Z 2025-05-07T19:44:30.2831515Z libstdcxx-ng-15.1.0 | 34 KB | ########## | 100%  2025-05-07T19:44:30.2832395Z 2025-05-07T19:44:30.2832406Z 2025-05-07T19:44:30.2832417Z 2025-05-07T19:44:30.2832427Z 2025-05-07T19:44:30.2832467Z 2025-05-07T19:44:30.2832478Z 2025-05-07T19:44:30.2832488Z 2025-05-07T19:44:30.2832498Z 2025-05-07T19:44:30.3045270Z libstdcxx-ng-15.1.0 | 34 KB | ########## | 100%  2025-05-07T19:44:30.3045798Z 2025-05-07T19:44:30.3045803Z 2025-05-07T19:44:30.3045806Z 2025-05-07T19:44:30.3045810Z 2025-05-07T19:44:30.3045814Z 2025-05-07T19:44:30.3045817Z 2025-05-07T19:44:30.3045821Z 2025-05-07T19:44:30.3045841Z 2025-05-07T19:44:30.3045845Z 2025-05-07T19:44:30.3046110Z gcc_linux-64-11.4.0 | 31 KB | ########## | 100%  2025-05-07T19:44:30.3046399Z 2025-05-07T19:44:30.3046402Z 2025-05-07T19:44:30.3046405Z 2025-05-07T19:44:30.3046409Z 2025-05-07T19:44:30.3046412Z 2025-05-07T19:44:30.3046415Z 2025-05-07T19:44:30.3046418Z 2025-05-07T19:44:30.3046421Z 2025-05-07T19:44:30.3046441Z 2025-05-07T19:44:30.3537781Z gcc_linux-64-11.4.0 | 31 KB | ########## | 100%  2025-05-07T19:44:30.3538097Z 2025-05-07T19:44:30.3538116Z 2025-05-07T19:44:30.3538131Z 2025-05-07T19:44:30.3751240Z binutils_impl_linux- | 6.0 MB | ########## | 100%  2025-05-07T19:44:30.3752388Z 2025-05-07T19:44:30.3752393Z 2025-05-07T19:44:30.3752397Z 2025-05-07T19:44:30.3752400Z 2025-05-07T19:44:30.3752424Z 2025-05-07T19:44:30.3752428Z 2025-05-07T19:44:30.3752431Z 2025-05-07T19:44:30.3752435Z 2025-05-07T19:44:30.3752439Z 2025-05-07T19:44:30.3752442Z 2025-05-07T19:44:30.3752728Z gxx_linux-64-11.4.0 | 29 KB | ########## | 100%  2025-05-07T19:44:30.3753043Z 2025-05-07T19:44:30.3753047Z 2025-05-07T19:44:30.3753050Z 2025-05-07T19:44:30.3753053Z 2025-05-07T19:44:30.3753075Z 2025-05-07T19:44:30.3753078Z 2025-05-07T19:44:30.3753082Z 2025-05-07T19:44:30.3753085Z 2025-05-07T19:44:30.3753089Z 2025-05-07T19:44:30.3753092Z 2025-05-07T19:44:30.3981092Z gxx_linux-64-11.4.0 | 29 KB | ########## | 100%  2025-05-07T19:44:30.3981429Z 2025-05-07T19:44:30.3981451Z 2025-05-07T19:44:30.3981711Z 2025-05-07T19:44:30.3981717Z 2025-05-07T19:44:30.3981721Z 2025-05-07T19:44:30.3981724Z 2025-05-07T19:44:30.3981737Z 2025-05-07T19:44:30.3981741Z 2025-05-07T19:44:30.3981744Z 2025-05-07T19:44:30.3981748Z 2025-05-07T19:44:30.3981751Z 2025-05-07T19:44:30.3982078Z binutils_linux-64-2. | 28 KB | ########## | 100%  2025-05-07T19:44:30.3982418Z 2025-05-07T19:44:30.3982422Z 2025-05-07T19:44:30.3982425Z 2025-05-07T19:44:30.3982429Z 2025-05-07T19:44:30.3982432Z 2025-05-07T19:44:30.3982436Z 2025-05-07T19:44:30.3982439Z 2025-05-07T19:44:30.3982443Z 2025-05-07T19:44:30.3982446Z 2025-05-07T19:44:30.3982449Z 2025-05-07T19:44:30.3982468Z 2025-05-07T19:44:30.4508232Z binutils_linux-64-2. | 28 KB | ########## | 100%  2025-05-07T19:44:30.4697067Z gcc_impl_linux-64-11 | 53.0 MB | #######3 | 74% 2025-05-07T19:44:30.4697892Z 2025-05-07T19:44:30.5664205Z gxx_impl_linux-64-11 | 11.2 MB | ########## | 100%  2025-05-07T19:44:30.6502672Z gcc_impl_linux-64-11 | 53.0 MB | ########6 | 86% 2025-05-07T19:44:30.6503505Z 2025-05-07T19:44:30.6503533Z 2025-05-07T19:44:30.7660200Z libstdcxx-devel_linu | 11.1 MB | ########## | 100%  2025-05-07T19:44:30.7661535Z gcc_impl_linux-64-11 | 53.0 MB | ########## | 100% 2025-05-07T19:44:31.2911247Z gcc_impl_linux-64-11 | 53.0 MB | ########## | 100% 2025-05-07T19:44:31.2913996Z gcc_impl_linux-64-11 | 53.0 MB | ########## | 100% 2025-05-07T19:44:31.2914983Z 2025-05-07T19:44:31.2915717Z 2025-05-07T19:44:31.2916363Z  2025-05-07T19:44:31.2916784Z 2025-05-07T19:44:31.2916788Z 2025-05-07T19:44:31.2916991Z  2025-05-07T19:44:31.2917219Z 2025-05-07T19:44:31.2917223Z 2025-05-07T19:44:31.2917226Z 2025-05-07T19:44:31.2917408Z  2025-05-07T19:44:31.2917669Z 2025-05-07T19:44:31.2917673Z 2025-05-07T19:44:31.2917677Z 2025-05-07T19:44:31.2917680Z 2025-05-07T19:44:31.2918112Z  2025-05-07T19:44:31.2918361Z 2025-05-07T19:44:31.2918385Z 2025-05-07T19:44:31.2918388Z 2025-05-07T19:44:31.2918392Z 2025-05-07T19:44:31.2918396Z 2025-05-07T19:44:31.2918585Z  2025-05-07T19:44:31.2918817Z 2025-05-07T19:44:31.2918821Z 2025-05-07T19:44:31.2918824Z 2025-05-07T19:44:31.2918827Z 2025-05-07T19:44:31.2918831Z 2025-05-07T19:44:31.2918835Z 2025-05-07T19:44:31.2919038Z  2025-05-07T19:44:31.2919271Z 2025-05-07T19:44:31.2919275Z 2025-05-07T19:44:31.2919278Z 2025-05-07T19:44:31.2919282Z 2025-05-07T19:44:31.2919285Z 2025-05-07T19:44:31.2919288Z 2025-05-07T19:44:31.2919292Z 2025-05-07T19:44:31.2919504Z  2025-05-07T19:44:31.2919739Z 2025-05-07T19:44:31.2919742Z 2025-05-07T19:44:31.2919746Z 2025-05-07T19:44:31.2919753Z 2025-05-07T19:44:31.2919757Z 2025-05-07T19:44:31.2919760Z 2025-05-07T19:44:31.2919763Z 2025-05-07T19:44:31.2919767Z 2025-05-07T19:44:31.2919979Z  2025-05-07T19:44:31.2920216Z 2025-05-07T19:44:31.2920220Z 2025-05-07T19:44:31.2920223Z 2025-05-07T19:44:31.2920226Z 2025-05-07T19:44:31.2920230Z 2025-05-07T19:44:31.2920233Z 2025-05-07T19:44:31.2920237Z 2025-05-07T19:44:31.2920240Z 2025-05-07T19:44:31.2920243Z 2025-05-07T19:44:31.2920443Z  2025-05-07T19:44:31.2920696Z 2025-05-07T19:44:31.2920700Z 2025-05-07T19:44:31.2920704Z 2025-05-07T19:44:31.2920707Z 2025-05-07T19:44:31.2920711Z 2025-05-07T19:44:31.2920714Z 2025-05-07T19:44:31.2920717Z 2025-05-07T19:44:31.2920720Z 2025-05-07T19:44:31.2920723Z 2025-05-07T19:44:31.2920837Z 2025-05-07T19:44:31.2921045Z  2025-05-07T19:44:31.2921312Z 2025-05-07T19:44:31.2921316Z 2025-05-07T19:44:31.2921319Z 2025-05-07T19:44:31.2921323Z 2025-05-07T19:44:31.2921326Z 2025-05-07T19:44:31.2921329Z 2025-05-07T19:44:31.2921333Z 2025-05-07T19:44:31.2921336Z 2025-05-07T19:44:31.2921339Z 2025-05-07T19:44:31.2921343Z 2025-05-07T19:44:31.2921346Z 2025-05-07T19:44:31.2921600Z  done 2025-05-07T19:44:31.3928139Z Preparing transaction: | done 2025-05-07T19:44:32.1950505Z Verifying transaction: - \ | / - \ | / done 2025-05-07T19:44:32.2964970Z Executing transaction: \ done 2025-05-07T19:44:32.3872618Z [INSTALL] Setting the C/C++ compiler symlinks ... 2025-05-07T19:44:36.0916811Z + ln -sf /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-cc /github/home/miniconda/envs/build_binary/bin/cc 2025-05-07T19:44:36.0918649Z 2025-05-07T19:44:36.0941297Z 2025-05-07T19:44:36.0965296Z + ln -sf /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-cc /github/home/miniconda/envs/build_binary/bin/gcc 2025-05-07T19:44:36.0965960Z 2025-05-07T19:44:36.0983266Z 2025-05-07T19:44:36.1005142Z + ln -sf /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ /github/home/miniconda/envs/build_binary/bin/c++ 2025-05-07T19:44:36.1005778Z 2025-05-07T19:44:36.1020008Z 2025-05-07T19:44:36.1039550Z + ln -sf /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ /github/home/miniconda/envs/build_binary/bin/g++ 2025-05-07T19:44:36.1041336Z 2025-05-07T19:44:36.1051862Z 2025-05-07T19:44:37.9110779Z /github/home/miniconda/envs/build_binary/bin/cc 2025-05-07T19:44:37.9111596Z 2025-05-07T19:44:37.9752987Z [CHECK] Binary cc found in PATH 2025-05-07T19:44:39.7513372Z /github/home/miniconda/envs/build_binary/bin/gcc 2025-05-07T19:44:39.7514201Z 2025-05-07T19:44:39.8100803Z [CHECK] Binary gcc found in PATH 2025-05-07T19:44:41.5937501Z /github/home/miniconda/envs/build_binary/bin/c++ 2025-05-07T19:44:41.5938833Z 2025-05-07T19:44:41.6506168Z [CHECK] Binary c++ found in PATH 2025-05-07T19:44:43.4422896Z /github/home/miniconda/envs/build_binary/bin/g++ 2025-05-07T19:44:43.4423724Z 2025-05-07T19:44:43.4994977Z [CHECK] Binary g++ found in PATH 2025-05-07T19:44:43.4996596Z [INFO] Printing out all preprocessor defines in the C compiler ... 2025-05-07T19:44:43.4997342Z + conda run -n build_binary cc -dM -E - 2025-05-07T19:44:43.4997625Z 2025-05-07T19:44:45.3094777Z #define __DBL_MIN_EXP__ (-1021) 2025-05-07T19:44:45.3096439Z #define __UINT_LEAST16_MAX__ 0xffff 2025-05-07T19:44:45.3098003Z #define __ATOMIC_ACQUIRE 2 2025-05-07T19:44:45.3098806Z #define __FLT128_MAX_10_EXP__ 4932 2025-05-07T19:44:45.3100035Z #define __FLT_MIN__ 1.17549435082228750796873653722224568e-38F 2025-05-07T19:44:45.3102227Z #define __GCC_IEC_559_COMPLEX 2 2025-05-07T19:44:45.3102951Z #define __UINT_LEAST8_TYPE__ unsigned char 2025-05-07T19:44:45.3103282Z #define __SIZEOF_FLOAT80__ 16 2025-05-07T19:44:45.3103578Z #define __INTMAX_C(c) c ## L 2025-05-07T19:44:45.3103862Z #define __CHAR_BIT__ 8 2025-05-07T19:44:45.3104128Z #define __UINT8_MAX__ 0xff 2025-05-07T19:44:45.3104397Z #define __SCHAR_WIDTH__ 8 2025-05-07T19:44:45.3104656Z #define __WINT_MAX__ 0xffffffffU 2025-05-07T19:44:45.3104950Z #define __FLT32_MIN_EXP__ (-125) 2025-05-07T19:44:45.3105234Z #define __ORDER_LITTLE_ENDIAN__ 1234 2025-05-07T19:44:45.3105565Z #define __SIZE_MAX__ 0xffffffffffffffffUL 2025-05-07T19:44:45.3105886Z #define __WCHAR_MAX__ 0x7fffffff 2025-05-07T19:44:45.3106202Z #define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_1 1 2025-05-07T19:44:45.3106545Z #define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_2 1 2025-05-07T19:44:45.3106901Z #define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_4 1 2025-05-07T19:44:45.3107349Z #define __DBL_DENORM_MIN__ ((double)4.94065645841246544176568792868221372e-324L) 2025-05-07T19:44:45.3108320Z #define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_8 1 2025-05-07T19:44:45.3108650Z #define __GCC_ATOMIC_CHAR_LOCK_FREE 2 2025-05-07T19:44:45.3108923Z #define __GCC_IEC_559 2 2025-05-07T19:44:45.3109180Z #define __FLT32X_DECIMAL_DIG__ 17 2025-05-07T19:44:45.3109449Z #define __FLT_EVAL_METHOD__ 0 2025-05-07T19:44:45.3109720Z #define __FLT64_DECIMAL_DIG__ 17 2025-05-07T19:44:45.3109990Z #define __GCC_ATOMIC_CHAR32_T_LOCK_FREE 2 2025-05-07T19:44:45.3110328Z #define __UINT_FAST64_MAX__ 0xffffffffffffffffUL 2025-05-07T19:44:45.3110648Z #define __SIG_ATOMIC_TYPE__ int 2025-05-07T19:44:45.3110923Z #define __DBL_MIN_10_EXP__ (-307) 2025-05-07T19:44:45.3111202Z #define __FINITE_MATH_ONLY__ 0 2025-05-07T19:44:45.3111456Z #define __FLT32X_MAX_EXP__ 1024 2025-05-07T19:44:45.3111722Z #define __FLT32_HAS_DENORM__ 1 2025-05-07T19:44:45.3111975Z #define __UINT_FAST8_MAX__ 0xff 2025-05-07T19:44:45.3112243Z #define __FLT32_MAX_10_EXP__ 38 2025-05-07T19:44:45.3112499Z #define __DEC64_MAX_EXP__ 385 2025-05-07T19:44:45.3112759Z #define __INT8_C(c) c 2025-05-07T19:44:45.3112993Z #define __INT_LEAST8_WIDTH__ 8 2025-05-07T19:44:45.3113298Z #define __UINT_LEAST64_MAX__ 0xffffffffffffffffUL 2025-05-07T19:44:45.3113618Z #define __SHRT_MAX__ 0x7fff 2025-05-07T19:44:45.3113940Z #define __LDBL_MAX__ 1.18973149535723176502126385303097021e+4932L 2025-05-07T19:44:45.3114308Z #define __FLT64X_MAX_10_EXP__ 4932 2025-05-07T19:44:45.3114577Z #define __LDBL_IS_IEC_60559__ 2 2025-05-07T19:44:45.3114852Z #define __FLT64X_HAS_QUIET_NAN__ 1 2025-05-07T19:44:45.3115123Z #define __UINT_LEAST8_MAX__ 0xff 2025-05-07T19:44:45.3115407Z #define __GCC_ATOMIC_BOOL_LOCK_FREE 2 2025-05-07T19:44:45.3115791Z #define __FLT128_DENORM_MIN__ 6.47517511943802511092443895822764655e-4966F128 2025-05-07T19:44:45.3116218Z #define __UINTMAX_TYPE__ long unsigned int 2025-05-07T19:44:45.3116501Z #define __linux 1 2025-05-07T19:44:45.3116735Z #define __DEC32_EPSILON__ 1E-6DF 2025-05-07T19:44:45.3117023Z #define __FLT_EVAL_METHOD_TS_18661_3__ 0 2025-05-07T19:44:45.3117296Z #define __unix 1 2025-05-07T19:44:45.3117530Z #define __UINT32_MAX__ 0xffffffffU 2025-05-07T19:44:45.3117800Z #define __FLT128_MIN_EXP__ (-16381) 2025-05-07T19:44:45.3118077Z #define __WINT_MIN__ 0U 2025-05-07T19:44:45.3118440Z #define __FLT128_MIN_10_EXP__ (-4931) 2025-05-07T19:44:45.3118734Z #define __FLT32X_IS_IEC_60559__ 2 2025-05-07T19:44:45.3119001Z #define __INT_LEAST16_WIDTH__ 16 2025-05-07T19:44:45.3119276Z #define __SCHAR_MAX__ 0x7f 2025-05-07T19:44:45.3119521Z #define __FLT128_MANT_DIG__ 113 2025-05-07T19:44:45.3119840Z #define __WCHAR_MIN__ (-__WCHAR_MAX__ - 1) 2025-05-07T19:44:45.3120147Z #define __INT64_C(c) c ## L 2025-05-07T19:44:45.3120410Z #define __GCC_ATOMIC_POINTER_LOCK_FREE 2 2025-05-07T19:44:45.3120717Z #define __FLT32X_MANT_DIG__ 53 2025-05-07T19:44:45.3120976Z #define __USER_LABEL_PREFIX__ 2025-05-07T19:44:45.3121338Z #define __FLT64X_EPSILON__ 1.08420217248550443400745280086994171e-19F64x 2025-05-07T19:44:45.3121713Z #define __STDC_HOSTED__ 1 2025-05-07T19:44:45.3121976Z #define __DEC64_MIN_EXP__ (-382) 2025-05-07T19:44:45.3122256Z #define __DBL_DIG__ 15 2025-05-07T19:44:45.3122480Z #define __FLT32_DIG__ 6 2025-05-07T19:44:45.3122796Z #define __FLT_EPSILON__ 1.19209289550781250000000000000000000e-7F 2025-05-07T19:44:45.3123150Z #define __SHRT_WIDTH__ 16 2025-05-07T19:44:45.3123415Z #define __FLT32_IS_IEC_60559__ 2 2025-05-07T19:44:45.3123736Z #define __LDBL_MIN__ 3.36210314311209350626267781732175260e-4932L 2025-05-07T19:44:45.3124097Z #define __STDC_UTF_16__ 1 2025-05-07T19:44:45.3124341Z #define __DBL_IS_IEC_60559__ 2 2025-05-07T19:44:45.3124617Z #define __DEC32_MAX__ 9.999999E96DF 2025-05-07T19:44:45.3125006Z #define __FLT64X_DENORM_MIN__ 3.64519953188247460252840593361941982e-4951F64x 2025-05-07T19:44:45.3125405Z #define __FLT32X_HAS_INFINITY__ 1 2025-05-07T19:44:45.3125695Z #define __INT32_MAX__ 0x7fffffff 2025-05-07T19:44:45.3125948Z #define __unix__ 1 2025-05-07T19:44:45.3126185Z #define __INT_WIDTH__ 32 2025-05-07T19:44:45.3126427Z #define __SIZEOF_LONG__ 8 2025-05-07T19:44:45.3126684Z #define __STDC_IEC_559__ 1 2025-05-07T19:44:45.3127004Z #define __STDC_ISO_10646__ 201103L 2025-05-07T19:44:45.3127277Z #define __UINT16_C(c) c 2025-05-07T19:44:45.3127506Z #define __DECIMAL_DIG__ 21 2025-05-07T19:44:45.3127767Z #define __STDC_IEC_559_COMPLEX__ 1 2025-05-07T19:44:45.3128131Z #define __FLT64_EPSILON__ 2.22044604925031308084726333618164062e-16F64 2025-05-07T19:44:45.3128485Z #define __gnu_linux__ 1 2025-05-07T19:44:45.3128733Z #define __FLT128_IS_IEC_60559__ 2 2025-05-07T19:44:45.3128999Z #define __FLT64X_MIN_10_EXP__ (-4931) 2025-05-07T19:44:45.3129287Z #define __LDBL_HAS_QUIET_NAN__ 1 2025-05-07T19:44:45.3129543Z #define __FLT64_MANT_DIG__ 53 2025-05-07T19:44:45.3129808Z #define __FLT64X_MANT_DIG__ 64 2025-05-07T19:44:45.3130050Z #define __GNUC__ 11 2025-05-07T19:44:45.3130269Z #define __pie__ 2 2025-05-07T19:44:45.3130472Z #define __MMX__ 1 2025-05-07T19:44:45.3130700Z #define __FLT_HAS_DENORM__ 1 2025-05-07T19:44:45.3130975Z #define __SIZEOF_LONG_DOUBLE__ 16 2025-05-07T19:44:45.3131246Z #define __BIGGEST_ALIGNMENT__ 16 2025-05-07T19:44:45.3131531Z #define __FLT64_MAX_10_EXP__ 308 2025-05-07T19:44:45.3131876Z #define __DBL_MAX__ ((double)1.79769313486231570814527423731704357e+308L) 2025-05-07T19:44:45.3132284Z #define __INT_FAST32_MAX__ 0x7fffffffffffffffL 2025-05-07T19:44:45.3132588Z #define __DBL_HAS_INFINITY__ 1 2025-05-07T19:44:45.3132859Z #define __SIZEOF_FLOAT__ 4 2025-05-07T19:44:45.3133115Z #define __HAVE_SPECULATION_SAFE_VALUE 1 2025-05-07T19:44:45.3133419Z #define __DEC32_MIN_EXP__ (-94) 2025-05-07T19:44:45.3133672Z #define __INTPTR_WIDTH__ 64 2025-05-07T19:44:45.3133934Z #define __FLT64X_HAS_INFINITY__ 1 2025-05-07T19:44:45.3134228Z #define __UINT_LEAST32_MAX__ 0xffffffffU 2025-05-07T19:44:45.3134515Z #define __FLT32X_HAS_DENORM__ 1 2025-05-07T19:44:45.3134789Z #define __INT_FAST16_TYPE__ long int 2025-05-07T19:44:45.3135061Z #define __MMX_WITH_SSE__ 1 2025-05-07T19:44:45.3135322Z #define __LDBL_HAS_DENORM__ 1 2025-05-07T19:44:45.3135580Z #define __FLT128_HAS_INFINITY__ 1 2025-05-07T19:44:45.3135858Z #define __DEC32_MIN__ 1E-95DF 2025-05-07T19:44:45.3136111Z #define __DBL_MAX_EXP__ 1024 2025-05-07T19:44:45.3136372Z #define __WCHAR_WIDTH__ 32 2025-05-07T19:44:45.3136742Z #define __FLT32_MAX__ 3.40282346638528859811704183484516925e+38F32 2025-05-07T19:44:45.3137107Z #define __DEC128_EPSILON__ 1E-33DL 2025-05-07T19:44:45.3137387Z #define __SSE2_MATH__ 1 2025-05-07T19:44:45.3137625Z #define __ATOMIC_HLE_RELEASE 131072 2025-05-07T19:44:45.3137942Z #define __PTRDIFF_MAX__ 0x7fffffffffffffffL 2025-05-07T19:44:45.3138231Z #define __amd64 1 2025-05-07T19:44:45.3138463Z #define __STDC_NO_THREADS__ 1 2025-05-07T19:44:45.3138721Z #define __ATOMIC_HLE_ACQUIRE 65536 2025-05-07T19:44:45.3139034Z #define __LONG_LONG_MAX__ 0x7fffffffffffffffLL 2025-05-07T19:44:45.3139338Z #define __SIZEOF_SIZE_T__ 8 2025-05-07T19:44:45.3139697Z #define __FLT64X_MIN_EXP__ (-16381) 2025-05-07T19:44:45.3140147Z #define __SIZEOF_WINT_T__ 4 2025-05-07T19:44:45.3140429Z #define __LONG_LONG_WIDTH__ 64 2025-05-07T19:44:45.3140719Z #define __FLT32_MAX_EXP__ 128 2025-05-07T19:44:45.3140996Z #define __GXX_ABI_VERSION 1016 2025-05-07T19:44:45.3141286Z #define __FLT_MIN_EXP__ (-125) 2025-05-07T19:44:45.3141564Z #define __GCC_HAVE_DWARF2_CFI_ASM 1 2025-05-07T19:44:45.3141867Z #define __INT16_MAX__ 0x7fff 2025-05-07T19:44:45.3142116Z #define __x86_64 1 2025-05-07T19:44:45.3142366Z #define __INT_FAST64_TYPE__ long int 2025-05-07T19:44:45.3142756Z #define __FLT64_DENORM_MIN__ 4.94065645841246544176568792868221372e-324F64 2025-05-07T19:44:45.3143262Z #define __DBL_MIN__ ((double)2.22507385850720138309023271733240406e-308L) 2025-05-07T19:44:45.3143759Z #define __FLT128_EPSILON__ 1.92592994438723585305597794258492732e-34F128 2025-05-07T19:44:45.3144258Z #define __FLT64X_NORM_MAX__ 1.18973149535723176502126385303097021e+4932F64x 2025-05-07T19:44:45.3144682Z #define __SIZEOF_POINTER__ 8 2025-05-07T19:44:45.3144942Z #define __LP64__ 1 2025-05-07T19:44:45.3145190Z #define __DBL_HAS_QUIET_NAN__ 1 2025-05-07T19:44:45.3145559Z #define __FLT32X_EPSILON__ 2.22044604925031308084726333618164062e-16F32x 2025-05-07T19:44:45.3146169Z #define __DECIMAL_BID_FORMAT__ 1 2025-05-07T19:44:45.3146438Z #define __FLT64_MIN_EXP__ (-1021) 2025-05-07T19:44:45.3146730Z #define __FLT64_MIN_10_EXP__ (-307) 2025-05-07T19:44:45.3147021Z #define __FLT64X_DECIMAL_DIG__ 21 2025-05-07T19:44:45.3147288Z #define __DEC128_MIN__ 1E-6143DL 2025-05-07T19:44:45.3147563Z #define __REGISTER_PREFIX__ 2025-05-07T19:44:45.3147812Z #define __UINT16_MAX__ 0xffff 2025-05-07T19:44:45.3148074Z #define __DBL_HAS_DENORM__ 1 2025-05-07T19:44:45.3148327Z #define __LDBL_HAS_INFINITY__ 1 2025-05-07T19:44:45.3148664Z #define __FLT32_MIN__ 1.17549435082228750796873653722224568e-38F32 2025-05-07T19:44:45.3149020Z #define __UINT8_TYPE__ unsigned char 2025-05-07T19:44:45.3149298Z #define __FLT_DIG__ 6 2025-05-07T19:44:45.3149520Z #define __NO_INLINE__ 1 2025-05-07T19:44:45.3149771Z #define __DEC_EVAL_METHOD__ 2 2025-05-07T19:44:45.3150100Z #define __DEC128_MAX__ 9.999999999999999999999999999999999E6144DL 2025-05-07T19:44:45.3150448Z #define __FLT_MANT_DIG__ 24 2025-05-07T19:44:45.3150716Z #define __LDBL_DECIMAL_DIG__ 21 2025-05-07T19:44:45.3150974Z #define __VERSION__ "11.4.0" 2025-05-07T19:44:45.3151244Z #define __UINT64_C(c) c ## UL 2025-05-07T19:44:45.3151494Z #define _STDC_PREDEF_H 1 2025-05-07T19:44:45.3151758Z #define __INT_LEAST32_MAX__ 0x7fffffff 2025-05-07T19:44:45.3152049Z #define __GCC_ATOMIC_INT_LOCK_FREE 2 2025-05-07T19:44:45.3152349Z #define __FLT128_MAX_EXP__ 16384 2025-05-07T19:44:45.3152610Z #define __FLT32_MANT_DIG__ 24 2025-05-07T19:44:45.3152926Z #define __FLOAT_WORD_ORDER__ __ORDER_LITTLE_ENDIAN__ 2025-05-07T19:44:45.3153272Z #define __FLT128_HAS_DENORM__ 1 2025-05-07T19:44:45.3153533Z #define __FLT32_DECIMAL_DIG__ 9 2025-05-07T19:44:45.3153807Z #define __FLT128_DIG__ 33 2025-05-07T19:44:45.3154042Z #define __INT32_C(c) c 2025-05-07T19:44:45.3154295Z #define __DEC64_EPSILON__ 1E-15DD 2025-05-07T19:44:45.3154568Z #define __ORDER_PDP_ENDIAN__ 3412 2025-05-07T19:44:45.3154855Z #define __DEC128_MIN_EXP__ (-6142) 2025-05-07T19:44:45.3155132Z #define __INT_FAST32_TYPE__ long int 2025-05-07T19:44:45.3155460Z #define __UINT_LEAST16_TYPE__ short unsigned int 2025-05-07T19:44:45.3155842Z #define unix 1 2025-05-07T19:44:45.3156066Z #define __SIZE_TYPE__ long unsigned int 2025-05-07T19:44:45.3156392Z #define __UINT64_MAX__ 0xffffffffffffffffUL 2025-05-07T19:44:45.3156690Z #define __FLT_IS_IEC_60559__ 2 2025-05-07T19:44:45.3157016Z #define __GNUC_WIDE_EXECUTION_CHARSET_NAME "UTF-32LE" 2025-05-07T19:44:45.3157339Z #define __FLT64X_DIG__ 18 2025-05-07T19:44:45.3157598Z #define __INT8_TYPE__ signed char 2025-05-07T19:44:45.3157848Z #define __ELF__ 1 2025-05-07T19:44:45.3175141Z #define __GCC_ASM_FLAG_OUTPUTS__ 1 2025-05-07T19:44:45.3175580Z #define __UINT32_TYPE__ unsigned int 2025-05-07T19:44:45.3175881Z #define __FLT_RADIX__ 2 2025-05-07T19:44:45.3176128Z #define __INT_LEAST16_TYPE__ short int 2025-05-07T19:44:45.3176676Z #define __LDBL_EPSILON__ 1.08420217248550443400745280086994171e-19L 2025-05-07T19:44:45.3177066Z #define __UINTMAX_C(c) c ## UL 2025-05-07T19:44:45.3177358Z #define __SSE_MATH__ 1 2025-05-07T19:44:45.3177590Z #define __k8 1 2025-05-07T19:44:45.3177912Z #define __FLT32X_MIN__ 2.22507385850720138309023271733240406e-308F32x 2025-05-07T19:44:45.3178320Z #define __SIG_ATOMIC_MAX__ 0x7fffffff 2025-05-07T19:44:45.3178640Z #define __GCC_ATOMIC_WCHAR_T_LOCK_FREE 2 2025-05-07T19:44:45.3178962Z #define __SIZEOF_PTRDIFF_T__ 8 2025-05-07T19:44:45.3179227Z #define __LDBL_DIG__ 18 2025-05-07T19:44:45.3179596Z #define __FLT64_IS_IEC_60559__ 2 2025-05-07T19:44:45.3179862Z #define __x86_64__ 1 2025-05-07T19:44:45.3180295Z #define __FLT32X_MIN_EXP__ (-1021) 2025-05-07T19:44:45.3180610Z #define __DEC32_SUBNORMAL_MIN__ 0.000001E-95DF 2025-05-07T19:44:45.3180979Z #define __INT_FAST16_MAX__ 0x7fffffffffffffffL 2025-05-07T19:44:45.3181301Z #define __FLT64_DIG__ 15 2025-05-07T19:44:45.3181609Z #define __UINT_FAST32_MAX__ 0xffffffffffffffffUL 2025-05-07T19:44:45.3181982Z #define __UINT_LEAST64_TYPE__ long unsigned int 2025-05-07T19:44:45.3182492Z #define __FLT_HAS_QUIET_NAN__ 1 2025-05-07T19:44:45.3182791Z #define __FLT_MAX_10_EXP__ 38 2025-05-07T19:44:45.3183081Z #define __LONG_MAX__ 0x7fffffffffffffffL 2025-05-07T19:44:45.3183414Z #define __FLT64X_HAS_DENORM__ 1 2025-05-07T19:44:45.3183800Z #define __DEC128_SUBNORMAL_MIN__ 0.000000000000000000000000000000001E-6143DL 2025-05-07T19:44:45.3184246Z #define __FLT_HAS_INFINITY__ 1 2025-05-07T19:44:45.3184549Z #define __GNUC_EXECUTION_CHARSET_NAME "UTF-8" 2025-05-07T19:44:45.3184920Z #define __UINT_FAST16_TYPE__ long unsigned int 2025-05-07T19:44:45.3185257Z #define __DEC64_MAX__ 9.999999999999999E384DD 2025-05-07T19:44:45.3185587Z #define __INT_FAST32_WIDTH__ 64 2025-05-07T19:44:45.3185898Z #define __CHAR16_TYPE__ short unsigned int 2025-05-07T19:44:45.3186218Z #define __PRAGMA_REDEFINE_EXTNAME 1 2025-05-07T19:44:45.3186524Z #define __SIZE_WIDTH__ 64 2025-05-07T19:44:45.3186768Z #define __SEG_FS 1 2025-05-07T19:44:45.3187021Z #define __INT_LEAST16_MAX__ 0x7fff 2025-05-07T19:44:45.3187310Z #define __DEC64_MANT_DIG__ 16 2025-05-07T19:44:45.3187614Z #define __INT64_MAX__ 0x7fffffffffffffffL 2025-05-07T19:44:45.3187912Z #define __SEG_GS 1 2025-05-07T19:44:45.3188262Z #define __FLT32_DENORM_MIN__ 1.40129846432481707092372958328991613e-45F32 2025-05-07T19:44:45.3188667Z #define __SIG_ATOMIC_WIDTH__ 32 2025-05-07T19:44:45.3188967Z #define __INT_LEAST64_TYPE__ long int 2025-05-07T19:44:45.3189282Z #define __INT16_TYPE__ short int 2025-05-07T19:44:45.3189573Z #define __INT_LEAST8_TYPE__ signed char 2025-05-07T19:44:45.3189898Z #define __STDC_VERSION__ 201710L 2025-05-07T19:44:45.3190174Z #define __SIZEOF_INT__ 4 2025-05-07T19:44:45.3190443Z #define __DEC32_MAX_EXP__ 97 2025-05-07T19:44:45.3190715Z #define __INT_FAST8_MAX__ 0x7f 2025-05-07T19:44:45.3191087Z #define __FLT128_MAX__ 1.18973149535723176508575932662800702e+4932F128 2025-05-07T19:44:45.3191498Z #define __INTPTR_MAX__ 0x7fffffffffffffffL 2025-05-07T19:44:45.3191815Z #define linux 1 2025-05-07T19:44:45.3192057Z #define __FLT64_HAS_QUIET_NAN__ 1 2025-05-07T19:44:45.3192444Z #define __FLT32_MIN_10_EXP__ (-37) 2025-05-07T19:44:45.3192723Z #define __FLT32X_DIG__ 15 2025-05-07T19:44:45.3192964Z #define __PTRDIFF_WIDTH__ 64 2025-05-07T19:44:45.3193305Z #define __LDBL_MANT_DIG__ 64 2025-05-07T19:44:45.3193563Z #define __FLT64_HAS_INFINITY__ 1 2025-05-07T19:44:45.3193919Z #define __FLT64X_MAX__ 1.18973149535723176502126385303097021e+4932F64x 2025-05-07T19:44:45.3194328Z #define __SIG_ATOMIC_MIN__ (-__SIG_ATOMIC_MAX__ - 1) 2025-05-07T19:44:45.3194676Z #define __code_model_small__ 1 2025-05-07T19:44:45.3194943Z #define __GCC_ATOMIC_LONG_LOCK_FREE 2 2025-05-07T19:44:45.3195239Z #define __DEC32_MANT_DIG__ 7 2025-05-07T19:44:45.3195493Z #define __k8__ 1 2025-05-07T19:44:45.3195713Z #define __INTPTR_TYPE__ long int 2025-05-07T19:44:45.3196010Z #define __UINT16_TYPE__ short unsigned int 2025-05-07T19:44:45.3196304Z #define __WCHAR_TYPE__ int 2025-05-07T19:44:45.3196550Z #define __pic__ 2 2025-05-07T19:44:45.3196793Z #define __UINTPTR_MAX__ 0xffffffffffffffffUL 2025-05-07T19:44:45.3197123Z #define __INT_FAST64_WIDTH__ 64 2025-05-07T19:44:45.3197408Z #define __INT_FAST64_MAX__ 0x7fffffffffffffffL 2025-05-07T19:44:45.3197743Z #define __GCC_ATOMIC_TEST_AND_SET_TRUEVAL 1 2025-05-07T19:44:45.3198103Z #define __FLT_NORM_MAX__ 3.40282346638528859811704183484516925e+38F 2025-05-07T19:44:45.3198469Z #define __FLT32_HAS_INFINITY__ 1 2025-05-07T19:44:45.3198749Z #define __FLT64X_MAX_EXP__ 16384 2025-05-07T19:44:45.3199036Z #define __UINT_FAST64_TYPE__ long unsigned int 2025-05-07T19:44:45.3199354Z #define __INT_MAX__ 0x7fffffff 2025-05-07T19:44:45.3199594Z #define __linux__ 1 2025-05-07T19:44:45.3199829Z #define __INT64_TYPE__ long int 2025-05-07T19:44:45.3200078Z #define __FLT_MAX_EXP__ 128 2025-05-07T19:44:45.3200520Z #define __ORDER_BIG_ENDIAN__ 4321 2025-05-07T19:44:45.3200991Z #define __DBL_MANT_DIG__ 53 2025-05-07T19:44:45.3201302Z #define __SIZEOF_FLOAT128__ 16 2025-05-07T19:44:45.3201624Z #define __INT_LEAST64_MAX__ 0x7fffffffffffffffL 2025-05-07T19:44:45.3201987Z #define __GCC_ATOMIC_CHAR16_T_LOCK_FREE 2 2025-05-07T19:44:45.3202437Z #define __DEC64_MIN__ 1E-383DD 2025-05-07T19:44:45.3202737Z #define __WINT_TYPE__ unsigned int 2025-05-07T19:44:45.3203052Z #define __UINT_LEAST32_TYPE__ unsigned int 2025-05-07T19:44:45.3203385Z #define __SIZEOF_SHORT__ 2 2025-05-07T19:44:45.3203730Z #define __FLT32_NORM_MAX__ 3.40282346638528859811704183484516925e+38F32 2025-05-07T19:44:45.3204128Z #define __SSE__ 1 2025-05-07T19:44:45.3204365Z #define __LDBL_MIN_EXP__ (-16381) 2025-05-07T19:44:45.3204741Z #define __FLT64_MAX__ 1.79769313486231570814527423731704357e+308F64 2025-05-07T19:44:45.3205121Z #define __amd64__ 1 2025-05-07T19:44:45.3205352Z #define __WINT_WIDTH__ 32 2025-05-07T19:44:45.3205625Z #define __INT_LEAST8_MAX__ 0x7f 2025-05-07T19:44:45.3205906Z #define __INT_LEAST64_WIDTH__ 64 2025-05-07T19:44:45.3206202Z #define __LDBL_MAX_EXP__ 16384 2025-05-07T19:44:45.3206477Z #define __FLT32X_MAX_10_EXP__ 308 2025-05-07T19:44:45.3206773Z #define __SIZEOF_INT128__ 16 2025-05-07T19:44:45.3207042Z #define __FLT64X_IS_IEC_60559__ 2 2025-05-07T19:44:45.3207336Z #define __LDBL_MAX_10_EXP__ 4932 2025-05-07T19:44:45.3207611Z #define __ATOMIC_RELAXED 0 2025-05-07T19:44:45.3207994Z #define __DBL_EPSILON__ ((double)2.22044604925031308084726333618164062e-16L) 2025-05-07T19:44:45.3208504Z #define __FLT128_MIN__ 3.36210314311209350626267781732175260e-4932F128 2025-05-07T19:44:45.3208880Z #define _LP64 1 2025-05-07T19:44:45.3209113Z #define __UINT8_C(c) c 2025-05-07T19:44:45.3209359Z #define __FLT64_MAX_EXP__ 1024 2025-05-07T19:44:45.3209647Z #define __INT_LEAST32_TYPE__ int 2025-05-07T19:44:45.3209924Z #define __SIZEOF_WCHAR_T__ 4 2025-05-07T19:44:45.3210219Z #define __UINT64_TYPE__ long unsigned int 2025-05-07T19:44:45.3210537Z #define __GNUC_PATCHLEVEL__ 0 2025-05-07T19:44:45.3210923Z #define __FLT128_NORM_MAX__ 1.18973149535723176508575932662800702e+4932F128 2025-05-07T19:44:45.3211435Z #define __FLT64_NORM_MAX__ 1.79769313486231570814527423731704357e+308F64 2025-05-07T19:44:45.3211832Z #define __FLT128_HAS_QUIET_NAN__ 1 2025-05-07T19:44:45.3212159Z #define __INTMAX_MAX__ 0x7fffffffffffffffL 2025-05-07T19:44:45.3212484Z #define __INT_FAST8_TYPE__ signed char 2025-05-07T19:44:45.3212983Z #define __FLT64X_MIN__ 3.36210314311209350626267781732175260e-4932F64x 2025-05-07T19:44:45.3213531Z #define __GNUC_STDC_INLINE__ 1 2025-05-07T19:44:45.3213924Z #define __FLT64_HAS_DENORM__ 1 2025-05-07T19:44:45.3214256Z #define __FLT32_EPSILON__ 1.19209289550781250000000000000000000e-7F32 2025-05-07T19:44:45.3214631Z #define __DBL_DECIMAL_DIG__ 17 2025-05-07T19:44:45.3214899Z #define __STDC_UTF_32__ 1 2025-05-07T19:44:45.3215140Z #define __INT_FAST8_WIDTH__ 8 2025-05-07T19:44:45.3215398Z #define __FXSR__ 1 2025-05-07T19:44:45.3215688Z #define __FLT32X_MAX__ 1.79769313486231570814527423731704357e+308F32x 2025-05-07T19:44:45.3216150Z #define __DBL_NORM_MAX__ ((double)1.79769313486231570814527423731704357e+308L) 2025-05-07T19:44:45.3216559Z #define __BYTE_ORDER__ __ORDER_LITTLE_ENDIAN__ 2025-05-07T19:44:45.3216875Z #define __INTMAX_WIDTH__ 64 2025-05-07T19:44:45.3217126Z #define __UINT32_C(c) c ## U 2025-05-07T19:44:45.3217465Z #define __FLT_DENORM_MIN__ 1.40129846432481707092372958328991613e-45F 2025-05-07T19:44:45.3217836Z #define __INT8_MAX__ 0x7f 2025-05-07T19:44:45.3218067Z #define __LONG_WIDTH__ 64 2025-05-07T19:44:45.3218307Z #define __PIC__ 2 2025-05-07T19:44:45.3218548Z #define __UINT_FAST32_TYPE__ long unsigned int 2025-05-07T19:44:45.3218958Z #define __FLT32X_NORM_MAX__ 1.79769313486231570814527423731704357e+308F32x 2025-05-07T19:44:45.3219338Z #define __CHAR32_TYPE__ unsigned int 2025-05-07T19:44:45.3219759Z #define __FLT_MAX__ 3.40282346638528859811704183484516925e+38F 2025-05-07T19:44:45.3220276Z #define __SSE2__ 1 2025-05-07T19:44:45.3220529Z #define __INT32_TYPE__ int 2025-05-07T19:44:45.3220853Z #define __SIZEOF_DOUBLE__ 8 2025-05-07T19:44:45.3221139Z #define __FLT_MIN_10_EXP__ (-37) 2025-05-07T19:44:45.3221507Z #define __FLT64_MIN__ 2.22507385850720138309023271733240406e-308F64 2025-05-07T19:44:45.3221887Z #define __INT_LEAST32_WIDTH__ 32 2025-05-07T19:44:45.3222256Z #define __INTMAX_TYPE__ long int 2025-05-07T19:44:45.3222574Z #define __DEC128_MAX_EXP__ 6145 2025-05-07T19:44:45.3222882Z #define __FLT32X_HAS_QUIET_NAN__ 1 2025-05-07T19:44:45.3223172Z #define __ATOMIC_CONSUME 1 2025-05-07T19:44:45.3223450Z #define __GNUC_MINOR__ 4 2025-05-07T19:44:45.3223726Z #define __INT_FAST16_WIDTH__ 64 2025-05-07T19:44:45.3224024Z #define __UINTMAX_MAX__ 0xffffffffffffffffUL 2025-05-07T19:44:45.3224349Z #define __PIE__ 2 2025-05-07T19:44:45.3224687Z #define __FLT32X_DENORM_MIN__ 4.94065645841246544176568792868221372e-324F32x 2025-05-07T19:44:45.3225117Z #define __DBL_MAX_10_EXP__ 308 2025-05-07T19:44:45.3225473Z #define __LDBL_DENORM_MIN__ 3.64519953188247460252840593361941982e-4951L 2025-05-07T19:44:45.3225869Z #define __INT16_C(c) c 2025-05-07T19:44:45.3226094Z #define __STDC__ 1 2025-05-07T19:44:45.3226345Z #define __PTRDIFF_TYPE__ long int 2025-05-07T19:44:45.3226630Z #define __ATOMIC_SEQ_CST 5 2025-05-07T19:44:45.3226905Z #define __FLT32X_MIN_10_EXP__ (-307) 2025-05-07T19:44:45.3227240Z #define __UINTPTR_TYPE__ long unsigned int 2025-05-07T19:44:45.3227605Z #define __DEC64_SUBNORMAL_MIN__ 0.000000000000001E-383DD 2025-05-07T19:44:45.3227974Z #define __DEC128_MANT_DIG__ 34 2025-05-07T19:44:45.3228249Z #define __LDBL_MIN_10_EXP__ (-4931) 2025-05-07T19:44:45.3228555Z #define __SIZEOF_LONG_LONG__ 8 2025-05-07T19:44:45.3228828Z #define __FLT128_DECIMAL_DIG__ 36 2025-05-07T19:44:45.3229136Z #define __GCC_ATOMIC_LLONG_LOCK_FREE 2 2025-05-07T19:44:45.3229441Z #define __FLT32_HAS_QUIET_NAN__ 1 2025-05-07T19:44:45.3229738Z #define __FLT_DECIMAL_DIG__ 9 2025-05-07T19:44:45.3230064Z #define __UINT_FAST16_MAX__ 0xffffffffffffffffUL 2025-05-07T19:44:45.3230473Z #define __LDBL_NORM_MAX__ 1.18973149535723176502126385303097021e+4932L 2025-05-07T19:44:45.3230880Z #define __GCC_ATOMIC_SHORT_LOCK_FREE 2 2025-05-07T19:44:45.3231193Z #define __UINT_FAST8_TYPE__ unsigned char 2025-05-07T19:44:45.3231519Z #define __ATOMIC_ACQ_REL 4 2025-05-07T19:44:45.3231778Z #define __ATOMIC_RELEASE 3 2025-05-07T19:44:45.3231963Z 2025-05-07T19:44:45.3685563Z 2025-05-07T19:44:45.3686517Z [INFO] Printing out all preprocessor defines in the C++ compiler ... 2025-05-07T19:44:45.3687629Z + conda run -n build_binary c++ -dM -E -x c++ - 2025-05-07T19:44:45.3687880Z 2025-05-07T19:44:47.1788742Z #define __DBL_MIN_EXP__ (-1021) 2025-05-07T19:44:47.1789368Z #define __cpp_attributes 200809L 2025-05-07T19:44:47.1789849Z #define __cpp_nontype_template_parameter_auto 201606L 2025-05-07T19:44:47.1790242Z #define __UINT_LEAST16_MAX__ 0xffff 2025-05-07T19:44:47.1790691Z #define __ATOMIC_ACQUIRE 2 2025-05-07T19:44:47.1790980Z #define __FLT128_MAX_10_EXP__ 4932 2025-05-07T19:44:47.1791354Z #define __FLT_MIN__ 1.17549435082228750796873653722224568e-38F 2025-05-07T19:44:47.1791732Z #define __GCC_IEC_559_COMPLEX 2 2025-05-07T19:44:47.1792054Z #define __cpp_aggregate_nsdmi 201304L 2025-05-07T19:44:47.1792396Z #define __UINT_LEAST8_TYPE__ unsigned char 2025-05-07T19:44:47.1792751Z #define __SIZEOF_FLOAT80__ 16 2025-05-07T19:44:47.1793092Z #define __INTMAX_C(c) c ## L 2025-05-07T19:44:47.1793365Z #define __CHAR_BIT__ 8 2025-05-07T19:44:47.1793628Z #define __UINT8_MAX__ 0xff 2025-05-07T19:44:47.1793906Z #define __SCHAR_WIDTH__ 8 2025-05-07T19:44:47.1794180Z #define __WINT_MAX__ 0xffffffffU 2025-05-07T19:44:47.1794464Z #define __FLT32_MIN_EXP__ (-125) 2025-05-07T19:44:47.1794767Z #define __cpp_static_assert 201411L 2025-05-07T19:44:47.1795065Z #define __ORDER_LITTLE_ENDIAN__ 1234 2025-05-07T19:44:47.1795395Z #define __SIZE_MAX__ 0xffffffffffffffffUL 2025-05-07T19:44:47.1795707Z #define __WCHAR_MAX__ 0x7fffffff 2025-05-07T19:44:47.1796023Z #define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_1 1 2025-05-07T19:44:47.1796379Z #define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_2 1 2025-05-07T19:44:47.1796718Z #define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_4 1 2025-05-07T19:44:47.1797159Z #define __DBL_DENORM_MIN__ double(4.94065645841246544176568792868221372e-324L) 2025-05-07T19:44:47.1797597Z #define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_8 1 2025-05-07T19:44:47.1798341Z #define __GCC_ATOMIC_CHAR_LOCK_FREE 2 2025-05-07T19:44:47.1798657Z #define __GCC_IEC_559 2 2025-05-07T19:44:47.1798936Z #define __FLT32X_DECIMAL_DIG__ 17 2025-05-07T19:44:47.1799241Z #define __FLT_EVAL_METHOD__ 0 2025-05-07T19:44:47.1799542Z #define __cpp_binary_literals 201304L 2025-05-07T19:44:47.1799841Z #define __FLT64_DECIMAL_DIG__ 17 2025-05-07T19:44:47.1800160Z #define __cpp_noexcept_function_type 201510L 2025-05-07T19:44:47.1800783Z #define __GCC_ATOMIC_CHAR32_T_LOCK_FREE 2 2025-05-07T19:44:47.1801289Z #define __cpp_variadic_templates 200704L 2025-05-07T19:44:47.1801687Z #define __UINT_FAST64_MAX__ 0xffffffffffffffffUL 2025-05-07T19:44:47.1802053Z #define __SIG_ATOMIC_TYPE__ int 2025-05-07T19:44:47.1802367Z #define __DBL_MIN_10_EXP__ (-307) 2025-05-07T19:44:47.1802664Z #define __FINITE_MATH_ONLY__ 0 2025-05-07T19:44:47.1802983Z #define __cpp_variable_templates 201304L 2025-05-07T19:44:47.1803305Z #define __FLT32X_MAX_EXP__ 1024 2025-05-07T19:44:47.1803605Z #define __FLT32_HAS_DENORM__ 1 2025-05-07T19:44:47.1803909Z #define __UINT_FAST8_MAX__ 0xff 2025-05-07T19:44:47.1804202Z #define __cpp_rvalue_reference 200610L 2025-05-07T19:44:47.1804585Z #define __cpp_nested_namespace_definitions 201411L 2025-05-07T19:44:47.1804941Z #define __DEC64_MAX_EXP__ 385 2025-05-07T19:44:47.1805228Z #define __INT8_C(c) c 2025-05-07T19:44:47.1805476Z #define __INT_LEAST8_WIDTH__ 8 2025-05-07T19:44:47.1805784Z #define __cpp_variadic_using 201611L 2025-05-07T19:44:47.1806127Z #define __UINT_LEAST64_MAX__ 0xffffffffffffffffUL 2025-05-07T19:44:47.1806508Z #define __INT_LEAST8_MAX__ 0x7f 2025-05-07T19:44:47.1806797Z #define __cpp_capture_star_this 201603L 2025-05-07T19:44:47.1807125Z #define __SHRT_MAX__ 0x7fff 2025-05-07T19:44:47.1807567Z #define __LDBL_MAX__ 1.18973149535723176502126385303097021e+4932L 2025-05-07T19:44:47.1807938Z #define __FLT64X_MAX_10_EXP__ 4932 2025-05-07T19:44:47.1808221Z #define __cpp_if_constexpr 201606L 2025-05-07T19:44:47.1808512Z #define __LDBL_IS_IEC_60559__ 2 2025-05-07T19:44:47.1808778Z #define __FLT64X_HAS_QUIET_NAN__ 1 2025-05-07T19:44:47.1809067Z #define __UINT_LEAST8_MAX__ 0xff 2025-05-07T19:44:47.1809359Z #define __GCC_ATOMIC_BOOL_LOCK_FREE 2 2025-05-07T19:44:47.1809933Z #define __FLT128_DENORM_MIN__ 6.47517511943802511092443895822764655e-4966F128 2025-05-07T19:44:47.1810375Z #define __UINTMAX_TYPE__ long unsigned int 2025-05-07T19:44:47.1810667Z #define __linux 1 2025-05-07T19:44:47.1810915Z #define __DEC32_EPSILON__ 1E-6DF 2025-05-07T19:44:47.1811193Z #define __FLT_EVAL_METHOD_TS_18661_3__ 0 2025-05-07T19:44:47.1811491Z #define __unix 1 2025-05-07T19:44:47.1811712Z #define __UINT32_MAX__ 0xffffffffU 2025-05-07T19:44:47.1812013Z #define __GXX_EXPERIMENTAL_CXX0X__ 1 2025-05-07T19:44:47.1812300Z #define __FLT128_MIN_EXP__ (-16381) 2025-05-07T19:44:47.1812597Z #define __WINT_MIN__ 0U 2025-05-07T19:44:47.1812859Z #define __FLT128_MIN_10_EXP__ (-4931) 2025-05-07T19:44:47.1813137Z #define __FLT32X_IS_IEC_60559__ 2 2025-05-07T19:44:47.1813428Z #define __INT_LEAST16_WIDTH__ 16 2025-05-07T19:44:47.1813700Z #define __SCHAR_MAX__ 0x7f 2025-05-07T19:44:47.1813973Z #define __FLT128_MANT_DIG__ 113 2025-05-07T19:44:47.1814253Z #define __WCHAR_MIN__ (-__WCHAR_MAX__ - 1) 2025-05-07T19:44:47.1814570Z #define __INT64_C(c) c ## L 2025-05-07T19:44:47.1814838Z #define __GCC_ATOMIC_POINTER_LOCK_FREE 2 2025-05-07T19:44:47.1815168Z #define __FLT32X_MANT_DIG__ 53 2025-05-07T19:44:47.1815450Z #define __GCC_ATOMIC_CHAR16_T_LOCK_FREE 2 2025-05-07T19:44:47.1815797Z #define __cpp_aligned_new 201606L 2025-05-07T19:44:47.1816124Z #define __USER_LABEL_PREFIX__ 2025-05-07T19:44:47.1816404Z #define __FLT32_MAX_10_EXP__ 38 2025-05-07T19:44:47.1816803Z #define __FLT64X_EPSILON__ 1.08420217248550443400745280086994171e-19F64x 2025-05-07T19:44:47.1817198Z #define __STDC_HOSTED__ 1 2025-05-07T19:44:47.1817494Z #define __DEC64_MIN_EXP__ (-382) 2025-05-07T19:44:47.1817786Z #define __cpp_decltype_auto 201304L 2025-05-07T19:44:47.1818110Z #define __DBL_DIG__ 15 2025-05-07T19:44:47.1818359Z #define __FLT32_DIG__ 6 2025-05-07T19:44:47.1818798Z #define __FLT_EPSILON__ 1.19209289550781250000000000000000000e-7F 2025-05-07T19:44:47.1819174Z #define __GXX_WEAK__ 1 2025-05-07T19:44:47.1819413Z #define __SHRT_WIDTH__ 16 2025-05-07T19:44:47.1819793Z #define __FLT32_IS_IEC_60559__ 2 2025-05-07T19:44:47.1820320Z #define __LDBL_MIN__ 3.36210314311209350626267781732175260e-4932L 2025-05-07T19:44:47.1820717Z #define __DBL_IS_IEC_60559__ 2 2025-05-07T19:44:47.1821029Z #define __DEC32_MAX__ 9.999999E96DF 2025-05-07T19:44:47.1821374Z #define __cpp_threadsafe_static_init 200806L 2025-05-07T19:44:47.1821729Z #define __cpp_enumerator_attributes 201411L 2025-05-07T19:44:47.1822186Z #define __FLT64X_DENORM_MIN__ 3.64519953188247460252840593361941982e-4951F64x 2025-05-07T19:44:47.1822620Z #define __FLT32X_HAS_INFINITY__ 1 2025-05-07T19:44:47.1822928Z #define __INT32_MAX__ 0x7fffffff 2025-05-07T19:44:47.1823220Z #define __unix__ 1 2025-05-07T19:44:47.1823452Z #define __INT_WIDTH__ 32 2025-05-07T19:44:47.1823729Z #define __SIZEOF_LONG__ 8 2025-05-07T19:44:47.1824005Z #define __STDC_IEC_559__ 1 2025-05-07T19:44:47.1824307Z #define __STDC_ISO_10646__ 201103L 2025-05-07T19:44:47.1824603Z #define __UINT16_C(c) c 2025-05-07T19:44:47.1824904Z #define __DECIMAL_DIG__ 21 2025-05-07T19:44:47.1825189Z #define __STDC_IEC_559_COMPLEX__ 1 2025-05-07T19:44:47.1825611Z #define __FLT64_EPSILON__ 2.22044604925031308084726333618164062e-16F64 2025-05-07T19:44:47.1826137Z #define __gnu_linux__ 1 2025-05-07T19:44:47.1826419Z #define __INT16_MAX__ 0x7fff 2025-05-07T19:44:47.1826728Z #define __FLT64_MIN_EXP__ (-1021) 2025-05-07T19:44:47.1827020Z #define __FLT64X_MIN_10_EXP__ (-4931) 2025-05-07T19:44:47.1827352Z #define __LDBL_HAS_QUIET_NAN__ 1 2025-05-07T19:44:47.1827636Z #define __FLT64_MANT_DIG__ 53 2025-05-07T19:44:47.1827931Z #define __FLT64X_MANT_DIG__ 64 2025-05-07T19:44:47.1828201Z #define __GNUC__ 11 2025-05-07T19:44:47.1828456Z #define __GXX_RTTI 1 2025-05-07T19:44:47.1828698Z #define __pie__ 2 2025-05-07T19:44:47.1828937Z #define __MMX__ 1 2025-05-07T19:44:47.1829160Z #define __FLT_HAS_DENORM__ 1 2025-05-07T19:44:47.1829437Z #define __SIZEOF_LONG_DOUBLE__ 16 2025-05-07T19:44:47.1829734Z #define __BIGGEST_ALIGNMENT__ 16 2025-05-07T19:44:47.1830086Z #define __STDC_UTF_16__ 1 2025-05-07T19:44:47.1830357Z #define __FLT64_MAX_10_EXP__ 308 2025-05-07T19:44:47.1830659Z #define __cpp_delegating_constructors 200604L 2025-05-07T19:44:47.1831002Z #define __FLT32_HAS_INFINITY__ 1 2025-05-07T19:44:47.1831347Z #define __DBL_MAX__ double(1.79769313486231570814527423731704357e+308L) 2025-05-07T19:44:47.1831744Z #define __cpp_raw_strings 200710L 2025-05-07T19:44:47.1832047Z #define __INT_FAST32_MAX__ 0x7fffffffffffffffL 2025-05-07T19:44:47.1832379Z #define __DBL_HAS_INFINITY__ 1 2025-05-07T19:44:47.1832655Z #define __SIZEOF_FLOAT__ 4 2025-05-07T19:44:47.1832915Z #define __HAVE_SPECULATION_SAFE_VALUE 1 2025-05-07T19:44:47.1833237Z #define __cpp_fold_expressions 201603L 2025-05-07T19:44:47.1833527Z #define __DEC32_MIN_EXP__ (-94) 2025-05-07T19:44:47.1833809Z #define __INTPTR_WIDTH__ 64 2025-05-07T19:44:47.1834073Z #define __FLT64X_HAS_INFINITY__ 1 2025-05-07T19:44:47.1834374Z #define __UINT_LEAST32_MAX__ 0xffffffffU 2025-05-07T19:44:47.1834676Z #define __FLT32X_HAS_DENORM__ 1 2025-05-07T19:44:47.1835085Z #define __INT_FAST16_TYPE__ long int 2025-05-07T19:44:47.1835370Z #define __MMX_WITH_SSE__ 1 2025-05-07T19:44:47.1835640Z #define __LDBL_HAS_DENORM__ 1 2025-05-07T19:44:47.1835918Z #define __cplusplus 201703L 2025-05-07T19:44:47.1836180Z #define __cpp_ref_qualifiers 200710L 2025-05-07T19:44:47.1836485Z #define __DEC32_MIN__ 1E-95DF 2025-05-07T19:44:47.1836741Z #define __DEPRECATED 1 2025-05-07T19:44:47.1837007Z #define __cpp_rvalue_references 200610L 2025-05-07T19:44:47.1837297Z #define __DBL_MAX_EXP__ 1024 2025-05-07T19:44:47.1837572Z #define __WCHAR_WIDTH__ 32 2025-05-07T19:44:47.1837881Z #define __FLT32_MAX__ 3.40282346638528859811704183484516925e+38F32 2025-05-07T19:44:47.1838254Z #define __DEC128_EPSILON__ 1E-33DL 2025-05-07T19:44:47.1838520Z #define __SSE2_MATH__ 1 2025-05-07T19:44:47.1838845Z #define __ATOMIC_HLE_RELEASE 131072 2025-05-07T19:44:47.1839164Z #define __PTRDIFF_MAX__ 0x7fffffffffffffffL 2025-05-07T19:44:47.1839453Z #define __amd64 1 2025-05-07T19:44:47.1839698Z #define __STDC_NO_THREADS__ 1 2025-05-07T19:44:47.1839961Z #define __ATOMIC_HLE_ACQUIRE 65536 2025-05-07T19:44:47.1840242Z #define __GNUG__ 11 2025-05-07T19:44:47.1840494Z #define __LONG_LONG_MAX__ 0x7fffffffffffffffLL 2025-05-07T19:44:47.1840821Z #define __SIZEOF_SIZE_T__ 8 2025-05-07T19:44:47.1841074Z #define __cpp_nsdmi 200809L 2025-05-07T19:44:47.1841354Z #define __FLT64X_MIN_EXP__ (-16381) 2025-05-07T19:44:47.1841636Z #define __SIZEOF_WINT_T__ 4 2025-05-07T19:44:47.1841912Z #define __LONG_LONG_WIDTH__ 64 2025-05-07T19:44:47.1842198Z #define __cpp_initializer_lists 200806L 2025-05-07T19:44:47.1842489Z #define __FLT32_MAX_EXP__ 128 2025-05-07T19:44:47.1842765Z #define __cpp_hex_float 201603L 2025-05-07T19:44:47.1843028Z #define __GXX_ABI_VERSION 1016 2025-05-07T19:44:47.1843302Z #define __FLT128_HAS_INFINITY__ 1 2025-05-07T19:44:47.1843619Z #define __FLT_MIN_EXP__ (-125) 2025-05-07T19:44:47.1843888Z #define __GCC_HAVE_DWARF2_CFI_ASM 1 2025-05-07T19:44:47.1844174Z #define __x86_64 1 2025-05-07T19:44:47.1844406Z #define __cpp_lambdas 200907L 2025-05-07T19:44:47.1844690Z #define __INT_FAST64_TYPE__ long int 2025-05-07T19:44:47.1845087Z #define __FLT64_DENORM_MIN__ 4.94065645841246544176568792868221372e-324F64 2025-05-07T19:44:47.1845484Z #define __cpp_template_auto 201606L 2025-05-07T19:44:47.1845867Z #define __DBL_MIN__ double(2.22507385850720138309023271733240406e-308L) 2025-05-07T19:44:47.1846325Z #define __FLT128_EPSILON__ 1.92592994438723585305597794258492732e-34F128 2025-05-07T19:44:47.1846823Z #define __FLT64X_NORM_MAX__ 1.18973149535723176502126385303097021e+4932F64x 2025-05-07T19:44:47.1847221Z #define __SIZEOF_POINTER__ 8 2025-05-07T19:44:47.1847499Z #define __LP64__ 1 2025-05-07T19:44:47.1847728Z #define __DBL_HAS_QUIET_NAN__ 1 2025-05-07T19:44:47.1848103Z #define __FLT32X_EPSILON__ 2.22044604925031308084726333618164062e-16F32x 2025-05-07T19:44:47.1848515Z #define __DECIMAL_BID_FORMAT__ 1 2025-05-07T19:44:47.1848795Z #define __FLT64_MIN_10_EXP__ (-307) 2025-05-07T19:44:47.1849165Z #define __FLT64X_DECIMAL_DIG__ 21 2025-05-07T19:44:47.1849439Z #define __DEC128_MIN__ 1E-6143DL 2025-05-07T19:44:47.1849727Z #define __REGISTER_PREFIX__ 2025-05-07T19:44:47.1849989Z #define __UINT16_MAX__ 0xffff 2025-05-07T19:44:47.1850278Z #define __LDBL_HAS_INFINITY__ 1 2025-05-07T19:44:47.1850606Z #define __FLT32_MIN__ 1.17549435082228750796873653722224568e-38F32 2025-05-07T19:44:47.1850996Z #define __UINT8_TYPE__ unsigned char 2025-05-07T19:44:47.1851274Z #define __FLT_DIG__ 6 2025-05-07T19:44:47.1851517Z #define __NO_INLINE__ 1 2025-05-07T19:44:47.1851781Z #define __DEC_EVAL_METHOD__ 2 2025-05-07T19:44:47.1852096Z #define __DEC128_MAX__ 9.999999999999999999999999999999999E6144DL 2025-05-07T19:44:47.1852459Z #define __FLT_MANT_DIG__ 24 2025-05-07T19:44:47.1852710Z #define __LDBL_DECIMAL_DIG__ 21 2025-05-07T19:44:47.1852985Z #define __VERSION__ "11.4.0" 2025-05-07T19:44:47.1853243Z #define __UINT64_C(c) c ## UL 2025-05-07T19:44:47.1853533Z #define __cpp_unicode_characters 201411L 2025-05-07T19:44:47.1853829Z #define _STDC_PREDEF_H 1 2025-05-07T19:44:47.1854092Z #define __INT_LEAST32_MAX__ 0x7fffffff 2025-05-07T19:44:47.1854397Z #define __GCC_ATOMIC_INT_LOCK_FREE 2 2025-05-07T19:44:47.1854678Z #define __FLT128_MAX_EXP__ 16384 2025-05-07T19:44:47.1854961Z #define __FLT32_MANT_DIG__ 24 2025-05-07T19:44:47.1855263Z #define __FLOAT_WORD_ORDER__ __ORDER_LITTLE_ENDIAN__ 2025-05-07T19:44:47.1855623Z #define __cpp_aggregate_bases 201603L 2025-05-07T19:44:47.1855920Z #define __FLT128_HAS_DENORM__ 1 2025-05-07T19:44:47.1856200Z #define __FLT32_DECIMAL_DIG__ 9 2025-05-07T19:44:47.1856458Z #define __FLT128_DIG__ 33 2025-05-07T19:44:47.1856714Z #define __INT32_C(c) c 2025-05-07T19:44:47.1856949Z #define __DEC64_EPSILON__ 1E-15DD 2025-05-07T19:44:47.1857241Z #define __ORDER_PDP_ENDIAN__ 3412 2025-05-07T19:44:47.1857540Z #define __DEC128_MIN_EXP__ (-6142) 2025-05-07T19:44:47.1857923Z #define __INT_FAST32_TYPE__ long int 2025-05-07T19:44:47.1858256Z #define __UINT_LEAST16_TYPE__ short unsigned int 2025-05-07T19:44:47.1858570Z #define unix 1 2025-05-07T19:44:47.1858809Z #define __DBL_HAS_DENORM__ 1 2025-05-07T19:44:47.1859068Z #define __cpp_rtti 199711L 2025-05-07T19:44:47.1859352Z #define __SIZE_TYPE__ long unsigned int 2025-05-07T19:44:47.1859784Z #define __UINT64_MAX__ 0xffffffffffffffffUL 2025-05-07T19:44:47.1860313Z #define __FLT_IS_IEC_60559__ 2 2025-05-07T19:44:47.1860648Z #define __GNUC_WIDE_EXECUTION_CHARSET_NAME "UTF-32LE" 2025-05-07T19:44:47.1861028Z #define __FLT64X_DIG__ 18 2025-05-07T19:44:47.1861316Z #define __INT8_TYPE__ signed char 2025-05-07T19:44:47.1861624Z #define __cpp_digit_separators 201309L 2025-05-07T19:44:47.1861946Z #define __ELF__ 1 2025-05-07T19:44:47.1862194Z #define __GCC_ASM_FLAG_OUTPUTS__ 1 2025-05-07T19:44:47.1862515Z #define __UINT32_TYPE__ unsigned int 2025-05-07T19:44:47.1862810Z #define __FLT_RADIX__ 2 2025-05-07T19:44:47.1863094Z #define __INT_LEAST16_TYPE__ short int 2025-05-07T19:44:47.1863478Z #define __LDBL_EPSILON__ 1.08420217248550443400745280086994171e-19L 2025-05-07T19:44:47.1863893Z #define __UINTMAX_C(c) c ## UL 2025-05-07T19:44:47.1864181Z #define __GLIBCXX_BITSIZE_INT_N_0 128 2025-05-07T19:44:47.1864488Z #define __k8 1 2025-05-07T19:44:47.1864821Z #define __FLT32X_MIN__ 2.22507385850720138309023271733240406e-308F32x 2025-05-07T19:44:47.1865219Z #define __SIG_ATOMIC_MAX__ 0x7fffffff 2025-05-07T19:44:47.1865551Z #define __GCC_ATOMIC_WCHAR_T_LOCK_FREE 2 2025-05-07T19:44:47.1865869Z #define __SIZEOF_PTRDIFF_T__ 8 2025-05-07T19:44:47.1866266Z #define __LDBL_DIG__ 18 2025-05-07T19:44:47.1866503Z #define __FLT64_IS_IEC_60559__ 2 2025-05-07T19:44:47.1866771Z #define __x86_64__ 1 2025-05-07T19:44:47.1867001Z #define __FLT32X_MIN_EXP__ (-1021) 2025-05-07T19:44:47.1867313Z #define __DEC32_SUBNORMAL_MIN__ 0.000001E-95DF 2025-05-07T19:44:47.1867662Z #define __INT_FAST16_MAX__ 0x7fffffffffffffffL 2025-05-07T19:44:47.1867960Z #define __FLT64_DIG__ 15 2025-05-07T19:44:47.1868255Z #define __UINT_FAST32_MAX__ 0xffffffffffffffffUL 2025-05-07T19:44:47.1868602Z #define __UINT_LEAST64_TYPE__ long unsigned int 2025-05-07T19:44:47.1869035Z #define __FLT_HAS_QUIET_NAN__ 1 2025-05-07T19:44:47.1869298Z #define __FLT_MAX_10_EXP__ 38 2025-05-07T19:44:47.1869586Z #define __LONG_MAX__ 0x7fffffffffffffffL 2025-05-07T19:44:47.1869876Z #define __FLT64X_HAS_DENORM__ 1 2025-05-07T19:44:47.1870256Z #define __DEC128_SUBNORMAL_MIN__ 0.000000000000000000000000000000001E-6143DL 2025-05-07T19:44:47.1870659Z #define __FLT_HAS_INFINITY__ 1 2025-05-07T19:44:47.1870965Z #define __GNUC_EXECUTION_CHARSET_NAME "UTF-8" 2025-05-07T19:44:47.1871304Z #define __cpp_unicode_literals 200710L 2025-05-07T19:44:47.1871618Z #define __UINT_FAST16_TYPE__ long unsigned int 2025-05-07T19:44:47.1871973Z #define __DEC64_MAX__ 9.999999999999999E384DD 2025-05-07T19:44:47.1872272Z #define __INT_FAST32_WIDTH__ 64 2025-05-07T19:44:47.1872572Z #define __CHAR16_TYPE__ short unsigned int 2025-05-07T19:44:47.1872881Z #define __PRAGMA_REDEFINE_EXTNAME 1 2025-05-07T19:44:47.1873174Z #define __SIZE_WIDTH__ 64 2025-05-07T19:44:47.1873411Z #define __SEG_FS 1 2025-05-07T19:44:47.1873659Z #define __INT_LEAST16_MAX__ 0x7fff 2025-05-07T19:44:47.1873949Z #define __DEC64_MANT_DIG__ 16 2025-05-07T19:44:47.1874219Z #define __INT64_MAX__ 0x7fffffffffffffffL 2025-05-07T19:44:47.1874519Z #define __SEG_GS 1 2025-05-07T19:44:47.1874828Z #define __FLT32_DENORM_MIN__ 1.40129846432481707092372958328991613e-45F32 2025-05-07T19:44:47.1875225Z #define __SIG_ATOMIC_WIDTH__ 32 2025-05-07T19:44:47.1875494Z #define __INT_LEAST64_TYPE__ long int 2025-05-07T19:44:47.1875797Z #define __INT16_TYPE__ short int 2025-05-07T19:44:47.1876076Z #define __INT_LEAST8_TYPE__ signed char 2025-05-07T19:44:47.1876407Z #define __cpp_structured_bindings 201606L 2025-05-07T19:44:47.1876707Z #define __SIZEOF_INT__ 4 2025-05-07T19:44:47.1876972Z #define __DEC32_MAX_EXP__ 97 2025-05-07T19:44:47.1877252Z #define __INT_FAST8_MAX__ 0x7f 2025-05-07T19:44:47.1880249Z #define __FLT128_MAX__ 1.18973149535723176508575932662800702e+4932F128 2025-05-07T19:44:47.1880740Z #define __INTPTR_MAX__ 0x7fffffffffffffffL 2025-05-07T19:44:47.1881081Z #define __cpp_sized_deallocation 201309L 2025-05-07T19:44:47.1881434Z #define __cpp_guaranteed_copy_elision 201606L 2025-05-07T19:44:47.1881803Z #define linux 1 2025-05-07T19:44:47.1882055Z #define __FLT64_HAS_QUIET_NAN__ 1 2025-05-07T19:44:47.1882333Z #define __FLT32_MIN_10_EXP__ (-37) 2025-05-07T19:44:47.1882633Z #define __EXCEPTIONS 1 2025-05-07T19:44:47.1882904Z #define __PTRDIFF_WIDTH__ 64 2025-05-07T19:44:47.1883172Z #define __LDBL_MANT_DIG__ 64 2025-05-07T19:44:47.1883456Z #define __cpp_range_based_for 201603L 2025-05-07T19:44:47.1883751Z #define __FLT64_HAS_INFINITY__ 1 2025-05-07T19:44:47.1884124Z #define __FLT64X_MAX__ 1.18973149535723176502126385303097021e+4932F64x 2025-05-07T19:44:47.1884515Z #define __STDCPP_DEFAULT_NEW_ALIGNMENT__ 16 2025-05-07T19:44:47.1884880Z #define __SIG_ATOMIC_MIN__ (-__SIG_ATOMIC_MAX__ - 1) 2025-05-07T19:44:47.1885211Z #define __code_model_small__ 1 2025-05-07T19:44:47.1885506Z #define __GCC_ATOMIC_LONG_LOCK_FREE 2 2025-05-07T19:44:47.1885825Z #define __cpp_nontype_template_args 201411L 2025-05-07T19:44:47.1886142Z #define __DEC32_MANT_DIG__ 7 2025-05-07T19:44:47.1886438Z #define __cpp_return_type_deduction 201304L 2025-05-07T19:44:47.1886724Z #define __k8__ 1 2025-05-07T19:44:47.1886962Z #define __INTPTR_TYPE__ long int 2025-05-07T19:44:47.1887244Z #define __UINT16_TYPE__ short unsigned int 2025-05-07T19:44:47.1887561Z #define __WCHAR_TYPE__ int 2025-05-07T19:44:47.1887800Z #define __pic__ 2 2025-05-07T19:44:47.1888059Z #define __UINTPTR_MAX__ 0xffffffffffffffffUL 2025-05-07T19:44:47.1888366Z #define __INT_FAST64_WIDTH__ 64 2025-05-07T19:44:47.1888647Z #define __cpp_decltype 200707L 2025-05-07T19:44:47.1888954Z #define __INT_FAST64_MAX__ 0x7fffffffffffffffL 2025-05-07T19:44:47.1889282Z #define __GCC_ATOMIC_TEST_AND_SET_TRUEVAL 1 2025-05-07T19:44:47.1889666Z #define __FLT_NORM_MAX__ 3.40282346638528859811704183484516925e+38F 2025-05-07T19:44:47.1890031Z #define __FLT64X_MAX_EXP__ 16384 2025-05-07T19:44:47.1890356Z #define __UINT_FAST64_TYPE__ long unsigned int 2025-05-07T19:44:47.1890761Z #define __cpp_inline_variables 201606L 2025-05-07T19:44:47.1891066Z #define __INT_MAX__ 0x7fffffff 2025-05-07T19:44:47.1891314Z #define __linux__ 1 2025-05-07T19:44:47.1891556Z #define __INT64_TYPE__ long int 2025-05-07T19:44:47.1891813Z #define __FLT_MAX_EXP__ 128 2025-05-07T19:44:47.1892089Z #define __ORDER_BIG_ENDIAN__ 4321 2025-05-07T19:44:47.1892384Z #define __DBL_MANT_DIG__ 53 2025-05-07T19:44:47.1892664Z #define __cpp_inheriting_constructors 201511L 2025-05-07T19:44:47.1892992Z #define __SIZEOF_FLOAT128__ 16 2025-05-07T19:44:47.1893284Z #define __INT_LEAST64_MAX__ 0x7fffffffffffffffL 2025-05-07T19:44:47.1893617Z #define __DEC64_MIN__ 1E-383DD 2025-05-07T19:44:47.1893879Z #define __WINT_TYPE__ unsigned int 2025-05-07T19:44:47.1894191Z #define __UINT_LEAST32_TYPE__ unsigned int 2025-05-07T19:44:47.1894487Z #define __SIZEOF_SHORT__ 2 2025-05-07T19:44:47.1894835Z #define __FLT32_NORM_MAX__ 3.40282346638528859811704183484516925e+38F32 2025-05-07T19:44:47.1895214Z #define __SSE__ 1 2025-05-07T19:44:47.1895437Z #define __LDBL_MIN_EXP__ (-16381) 2025-05-07T19:44:47.1895788Z #define __FLT64_MAX__ 1.79769313486231570814527423731704357e+308F64 2025-05-07T19:44:47.1896133Z #define __amd64__ 1 2025-05-07T19:44:47.1896369Z #define __WINT_WIDTH__ 32 2025-05-07T19:44:47.1896617Z #define __INT_LEAST64_WIDTH__ 64 2025-05-07T19:44:47.1896901Z #define __LDBL_MAX_EXP__ 16384 2025-05-07T19:44:47.1897164Z #define __FLT32X_MAX_10_EXP__ 308 2025-05-07T19:44:47.1897450Z #define __SIZEOF_INT128__ 16 2025-05-07T19:44:47.1897706Z #define __FLT64X_IS_IEC_60559__ 2 2025-05-07T19:44:47.1897988Z #define __LDBL_MAX_10_EXP__ 4932 2025-05-07T19:44:47.1898267Z #define __ATOMIC_RELAXED 0 2025-05-07T19:44:47.1898610Z #define __DBL_EPSILON__ double(2.22044604925031308084726333618164062e-16L) 2025-05-07T19:44:47.1899158Z #define __FLT128_MIN__ 3.36210314311209350626267781732175260e-4932F128 2025-05-07T19:44:47.1899646Z #define _LP64 1 2025-05-07T19:44:47.1900059Z #define __UINT8_C(c) c 2025-05-07T19:44:47.1900537Z #define __FLT64_MAX_EXP__ 1024 2025-05-07T19:44:47.1900883Z #define __INT_LEAST32_TYPE__ int 2025-05-07T19:44:47.1901166Z #define __SIZEOF_WCHAR_T__ 4 2025-05-07T19:44:47.1901462Z #define __GNUC_PATCHLEVEL__ 0 2025-05-07T19:44:47.1901841Z #define __FLT128_NORM_MAX__ 1.18973149535723176508575932662800702e+4932F128 2025-05-07T19:44:47.1902361Z #define __FLT64_NORM_MAX__ 1.79769313486231570814527423731704357e+308F64 2025-05-07T19:44:47.1902783Z #define __FLT128_HAS_QUIET_NAN__ 1 2025-05-07T19:44:47.1903095Z #define __INTMAX_MAX__ 0x7fffffffffffffffL 2025-05-07T19:44:47.1903440Z #define __INT_FAST8_TYPE__ signed char 2025-05-07T19:44:47.1903768Z #define __cpp_namespace_attributes 201411L 2025-05-07T19:44:47.1904188Z #define __FLT64X_MIN__ 3.36210314311209350626267781732175260e-4932F64x 2025-05-07T19:44:47.1904585Z #define __STDCPP_THREADS__ 1 2025-05-07T19:44:47.1904885Z #define __GNUC_STDC_INLINE__ 1 2025-05-07T19:44:47.1905167Z #define __FLT64_HAS_DENORM__ 1 2025-05-07T19:44:47.1905543Z #define __FLT32_EPSILON__ 1.19209289550781250000000000000000000e-7F32 2025-05-07T19:44:47.1905962Z #define __DBL_DECIMAL_DIG__ 17 2025-05-07T19:44:47.1906237Z #define __STDC_UTF_32__ 1 2025-05-07T19:44:47.1906522Z #define __INT_FAST8_WIDTH__ 8 2025-05-07T19:44:47.1906786Z #define __FXSR__ 1 2025-05-07T19:44:47.1907127Z #define __FLT32X_MAX__ 1.79769313486231570814527423731704357e+308F32x 2025-05-07T19:44:47.1907616Z #define __DBL_NORM_MAX__ double(1.79769313486231570814527423731704357e+308L) 2025-05-07T19:44:47.1908270Z #define __BYTE_ORDER__ __ORDER_LITTLE_ENDIAN__ 2025-05-07T19:44:47.1908600Z #define __INTMAX_WIDTH__ 64 2025-05-07T19:44:47.1908893Z #define __cpp_runtime_arrays 198712L 2025-05-07T19:44:47.1909226Z #define __UINT64_TYPE__ long unsigned int 2025-05-07T19:44:47.1909540Z #define __UINT32_C(c) c ## U 2025-05-07T19:44:47.1909842Z #define __cpp_alias_templates 200704L 2025-05-07T19:44:47.1910231Z #define __FLT_DENORM_MIN__ 1.40129846432481707092372958328991613e-45F 2025-05-07T19:44:47.1910649Z #define __FLT128_IS_IEC_60559__ 2 2025-05-07T19:44:47.1911080Z #define __INT8_MAX__ 0x7f 2025-05-07T19:44:47.1911365Z #define __LONG_WIDTH__ 64 2025-05-07T19:44:47.1911618Z #define __PIC__ 2 2025-05-07T19:44:47.1911908Z #define __UINT_FAST32_TYPE__ long unsigned int 2025-05-07T19:44:47.1912472Z #define __FLT32X_NORM_MAX__ 1.79769313486231570814527423731704357e+308F32x 2025-05-07T19:44:47.1912983Z #define __CHAR32_TYPE__ unsigned int 2025-05-07T19:44:47.1913338Z #define __FLT_MAX__ 3.40282346638528859811704183484516925e+38F 2025-05-07T19:44:47.1913686Z #define __cpp_constexpr 201603L 2025-05-07T19:44:47.1913968Z #define __SSE2__ 1 2025-05-07T19:44:47.1914208Z #define __cpp_deduction_guides 201703L 2025-05-07T19:44:47.1914520Z #define __INT32_TYPE__ int 2025-05-07T19:44:47.1914773Z #define __SIZEOF_DOUBLE__ 8 2025-05-07T19:44:47.1915066Z #define __cpp_exceptions 199711L 2025-05-07T19:44:47.1915347Z #define __FLT_MIN_10_EXP__ (-37) 2025-05-07T19:44:47.1915699Z #define __FLT64_MIN__ 2.22507385850720138309023271733240406e-308F64 2025-05-07T19:44:47.1916081Z #define __INT_LEAST32_WIDTH__ 32 2025-05-07T19:44:47.1916348Z #define __INTMAX_TYPE__ long int 2025-05-07T19:44:47.1916636Z #define __DEC128_MAX_EXP__ 6145 2025-05-07T19:44:47.1916907Z #define __FLT32X_HAS_QUIET_NAN__ 1 2025-05-07T19:44:47.1917203Z #define __ATOMIC_CONSUME 1 2025-05-07T19:44:47.1917453Z #define __GNUC_MINOR__ 4 2025-05-07T19:44:47.1917731Z #define __GLIBCXX_TYPE_INT_N_0 __int128 2025-05-07T19:44:47.1918027Z #define __INT_FAST16_WIDTH__ 64 2025-05-07T19:44:47.1918339Z #define __UINTMAX_MAX__ 0xffffffffffffffffUL 2025-05-07T19:44:47.1918654Z #define __PIE__ 2 2025-05-07T19:44:47.1918972Z #define __FLT32X_DENORM_MIN__ 4.94065645841246544176568792868221372e-324F32x 2025-05-07T19:44:47.1919408Z #define __cpp_template_template_args 201611L 2025-05-07T19:44:47.1919711Z #define __DBL_MAX_10_EXP__ 308 2025-05-07T19:44:47.1920145Z #define __LDBL_DENORM_MIN__ 3.64519953188247460252840593361941982e-4951L 2025-05-07T19:44:47.1920512Z #define __INT16_C(c) c 2025-05-07T19:44:47.1920754Z #define __STDC__ 1 2025-05-07T19:44:47.1920965Z #define __FLT32X_DIG__ 15 2025-05-07T19:44:47.1921226Z #define __PTRDIFF_TYPE__ long int 2025-05-07T19:44:47.1921495Z #define __ATOMIC_SEQ_CST 5 2025-05-07T19:44:47.1921755Z #define __FLT32X_MIN_10_EXP__ (-307) 2025-05-07T19:44:47.1922068Z #define __UINTPTR_TYPE__ long unsigned int 2025-05-07T19:44:47.1922412Z #define __DEC64_SUBNORMAL_MIN__ 0.000000000000001E-383DD 2025-05-07T19:44:47.1922767Z #define __DEC128_MANT_DIG__ 34 2025-05-07T19:44:47.1923034Z #define __LDBL_MIN_10_EXP__ (-4931) 2025-05-07T19:44:47.1923340Z #define __cpp_generic_lambdas 201304L 2025-05-07T19:44:47.1923617Z #define __SSE_MATH__ 1 2025-05-07T19:44:47.1923870Z #define __SIZEOF_LONG_LONG__ 8 2025-05-07T19:44:47.1924150Z #define __cpp_user_defined_literals 200809L 2025-05-07T19:44:47.1924468Z #define __FLT128_DECIMAL_DIG__ 36 2025-05-07T19:44:47.1924767Z #define __GCC_ATOMIC_LLONG_LOCK_FREE 2 2025-05-07T19:44:47.1925053Z #define __FLT32_HAS_QUIET_NAN__ 1 2025-05-07T19:44:47.1925344Z #define __FLT_DECIMAL_DIG__ 9 2025-05-07T19:44:47.1925635Z #define __UINT_FAST16_MAX__ 0xffffffffffffffffUL 2025-05-07T19:44:47.1926043Z #define __LDBL_NORM_MAX__ 1.18973149535723176502126385303097021e+4932L 2025-05-07T19:44:47.1926415Z #define __GCC_ATOMIC_SHORT_LOCK_FREE 2 2025-05-07T19:44:47.1926730Z #define __UINT_FAST8_TYPE__ unsigned char 2025-05-07T19:44:47.1927016Z #define _GNU_SOURCE 1 2025-05-07T19:44:47.1927275Z #define __cpp_init_captures 201304L 2025-05-07T19:44:47.1927553Z #define __ATOMIC_ACQ_REL 4 2025-05-07T19:44:47.1927813Z #define __ATOMIC_RELEASE 3 2025-05-07T19:44:47.1927971Z 2025-05-07T19:44:47.2524914Z 2025-05-07T19:44:47.2525653Z + conda run -n build_binary c++ --version 2025-05-07T19:44:47.2525930Z 2025-05-07T19:44:49.0566201Z c++ (conda-forge gcc 11.4.0-13) 11.4.0 2025-05-07T19:44:49.0566660Z Copyright (C) 2021 Free Software Foundation, Inc. 2025-05-07T19:44:49.0567194Z This is free software; see the source for copying conditions. There is NO 2025-05-07T19:44:49.0567796Z warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. 2025-05-07T19:44:49.0568417Z 2025-05-07T19:44:49.0568422Z 2025-05-07T19:44:49.1339363Z 2025-05-07T19:44:49.1340892Z [INFO] Printing the default version of the C standard used by the compiler ... 2025-05-07T19:44:49.1341568Z + conda run -n build_binary cc -dM -E - < /dev/null | grep __STDC_VERSION__ 2025-05-07T19:44:49.1341952Z 2025-05-07T19:44:51.0264612Z #define __STDC_VERSION__ 201710L 2025-05-07T19:44:51.0270140Z 2025-05-07T19:44:51.0270659Z [INFO] Printing the default version of the C++ standard used by the compiler ... 2025-05-07T19:44:51.0271300Z + conda run -n build_binary c++ -dM -E -x c++ - < /dev/null | grep __cplusplus 2025-05-07T19:44:51.0271691Z 2025-05-07T19:44:52.9199539Z #define __cplusplus 201703L 2025-05-07T19:44:52.9201865Z 2025-05-07T19:44:52.9202372Z [INSTALL] Successfully installed C/C++ compilers 2025-05-07T19:44:52.9282696Z ##[group]Run . $PRELUDE; install_build_tools $BUILD_ENV 2025-05-07T19:44:52.9283171Z . $PRELUDE; install_build_tools $BUILD_ENV 2025-05-07T19:44:52.9283725Z shell: bash --noprofile --norc -e -o pipefail {0} 2025-05-07T19:44:52.9284057Z env: 2025-05-07T19:44:52.9284288Z PRELUDE: .github/scripts/setup_env.bash 2025-05-07T19:44:52.9284587Z BUILD_ENV: build_binary 2025-05-07T19:44:52.9284843Z BUILD_TARGET: genai 2025-05-07T19:44:52.9285069Z BUILD_VARIANT: cuda 2025-05-07T19:44:52.9285313Z BUILD_CUDA_VERSION: 12.8.0 2025-05-07T19:44:52.9285566Z ##[endgroup] 2025-05-07T19:44:53.3837912Z ################################################################################ 2025-05-07T19:44:53.3838326Z # Install Build Tools 2025-05-07T19:44:53.3838560Z # 2025-05-07T19:44:53.3858925Z # [2025-05-07T19:44:53.385Z] + install_build_tools build_binary 2025-05-07T19:44:53.3860439Z ################################################################################ 2025-05-07T19:44:53.3861223Z 2025-05-07T19:44:53.3873853Z [EXEC] [ATTEMPT 0/3] + wget -q --timeout 1 pypi.org -O /dev/null 2025-05-07T19:44:53.4727078Z [CHECK] Network does not appear to be blocked. 2025-05-07T19:44:53.4736555Z [INSTALL] Installing build tools ... 2025-05-07T19:44:53.4760900Z [EXEC] [ATTEMPT 0/3] + conda install -n build_binary -c conda-forge --override-channels -y auditwheel bazel cmake>=3.30 hypothesis jinja2 make ncurses ninja openblas patchelf rhash scikit-build wheel pyyaml 2025-05-07T19:44:54.1781421Z Channels: 2025-05-07T19:44:54.1782106Z - conda-forge 2025-05-07T19:44:54.1782757Z Platform: linux-64 2025-05-07T19:44:57.2558579Z Collecting package metadata (repodata.json): - \ | / done 2025-05-07T19:45:00.4998876Z Solving environment: \ | / - done 2025-05-07T19:45:00.5522922Z 2025-05-07T19:45:00.5523378Z ## Package Plan ## 2025-05-07T19:45:00.5524292Z 2025-05-07T19:45:00.5525312Z environment location: /github/home/miniconda/envs/build_binary 2025-05-07T19:45:00.5526345Z 2025-05-07T19:45:00.5526662Z added / updated specs: 2025-05-07T19:45:00.5527415Z - auditwheel 2025-05-07T19:45:00.5528045Z - bazel 2025-05-07T19:45:00.5528669Z - cmake[version='>=3.30'] 2025-05-07T19:45:00.5529422Z - hypothesis 2025-05-07T19:45:00.5530104Z - jinja2 2025-05-07T19:45:00.5530319Z - make 2025-05-07T19:45:00.5530540Z - ncurses 2025-05-07T19:45:00.5530749Z - ninja 2025-05-07T19:45:00.5530974Z - openblas 2025-05-07T19:45:00.5531190Z - patchelf 2025-05-07T19:45:00.5531430Z - pyyaml 2025-05-07T19:45:00.5531739Z - rhash 2025-05-07T19:45:00.5531950Z - scikit-build 2025-05-07T19:45:00.5532196Z - wheel 2025-05-07T19:45:00.5532316Z 2025-05-07T19:45:00.5532320Z 2025-05-07T19:45:00.5532468Z The following packages will be downloaded: 2025-05-07T19:45:00.5532697Z 2025-05-07T19:45:00.5532819Z package | build 2025-05-07T19:45:00.5533294Z ---------------------------|----------------- 2025-05-07T19:45:00.5533677Z alsa-lib-1.2.14 | hb9d3cd8_0 553 KB conda-forge 2025-05-07T19:45:00.5534128Z attrs-25.3.0 | pyh71513ae_0 56 KB conda-forge 2025-05-07T19:45:00.5534889Z auditwheel-6.2.0 | pyha804496_1 40 KB conda-forge 2025-05-07T19:45:00.5535325Z bazel-7.5.0 | h96810dc_2 47.4 MB conda-forge 2025-05-07T19:45:00.5535733Z c-ares-1.34.5 | hb9d3cd8_0 202 KB conda-forge 2025-05-07T19:45:00.5536124Z cairo-1.18.4 | h3394656_0 955 KB conda-forge 2025-05-07T19:45:00.5536527Z click-8.1.8 | pyh707e725_0 83 KB conda-forge 2025-05-07T19:45:00.5536917Z cmake-4.0.2 | h74e3db0_0 19.4 MB conda-forge 2025-05-07T19:45:00.5537335Z distro-1.9.0 | pyhd8ed1ab_1 41 KB conda-forge 2025-05-07T19:45:00.5537983Z exceptiongroup-1.2.2 | pyhd8ed1ab_1 20 KB conda-forge 2025-05-07T19:45:00.5538422Z expat-2.7.0 | h5888daf_0 137 KB conda-forge 2025-05-07T19:45:00.5538919Z font-ttf-dejavu-sans-mono-2.37| hab24e00_0 388 KB conda-forge 2025-05-07T19:45:00.5539563Z font-ttf-inconsolata-3.000 | h77eed37_0 94 KB conda-forge 2025-05-07T19:45:00.5540356Z font-ttf-source-code-pro-2.038| h77eed37_0 684 KB conda-forge 2025-05-07T19:45:00.5540907Z font-ttf-ubuntu-0.83 | h77eed37_3 1.5 MB conda-forge 2025-05-07T19:45:00.5541443Z fontconfig-2.15.0 | h7e30c49_1 259 KB conda-forge 2025-05-07T19:45:00.5541986Z fonts-conda-ecosystem-1 | 0 4 KB conda-forge 2025-05-07T19:45:00.5542514Z fonts-conda-forge-1 | 0 4 KB conda-forge 2025-05-07T19:45:00.5543030Z freetype-2.13.3 | ha770c72_1 168 KB conda-forge 2025-05-07T19:45:00.5543498Z giflib-5.2.2 | hd590300_0 75 KB conda-forge 2025-05-07T19:45:00.5543986Z graphite2-1.3.13 | h59595ed_1003 95 KB conda-forge 2025-05-07T19:45:00.5544490Z harfbuzz-11.0.0 | h76408a6_0 1.6 MB conda-forge 2025-05-07T19:45:00.5544961Z hypothesis-6.131.14 | pyha770c72_0 348 KB conda-forge 2025-05-07T19:45:00.5545424Z icu-75.1 | he02047a_0 11.6 MB conda-forge 2025-05-07T19:45:00.5545837Z ijar-7.5.0 | h5888daf_0 114 KB conda-forge 2025-05-07T19:45:00.5546378Z jinja2-3.1.6 | pyhd8ed1ab_0 110 KB conda-forge 2025-05-07T19:45:00.5546797Z keyutils-1.6.1 | h166bdaf_0 115 KB conda-forge 2025-05-07T19:45:00.5547213Z krb5-1.21.3 | h659f571_0 1.3 MB conda-forge 2025-05-07T19:45:00.5547613Z lcms2-2.17 | h717163a_0 242 KB conda-forge 2025-05-07T19:45:00.5548000Z lerc-4.0.0 | h0aef613_1 258 KB conda-forge 2025-05-07T19:45:00.5548477Z libabseil-20250127.1 | cxx17_hbbce691_0 1.3 MB conda-forge 2025-05-07T19:45:00.5548949Z libcups-2.3.3 | h4637d8d_4 4.3 MB conda-forge 2025-05-07T19:45:00.5549404Z libcurl-8.13.0 | h332b0f4_0 428 KB conda-forge 2025-05-07T19:45:00.5549838Z libdeflate-1.23 | h86f0d12_0 71 KB conda-forge 2025-05-07T19:45:00.5550334Z libedit-3.1.20250104 | pl5321h7949ede_0 132 KB conda-forge 2025-05-07T19:45:00.5550806Z libev-4.33 | hd590300_2 110 KB conda-forge 2025-05-07T19:45:00.5551222Z libexpat-2.7.0 | h5888daf_0 73 KB conda-forge 2025-05-07T19:45:00.5551699Z libfreetype-2.13.3 | ha770c72_1 8 KB conda-forge 2025-05-07T19:45:00.5552168Z libfreetype6-2.13.3 | h48d6fc4_1 371 KB conda-forge 2025-05-07T19:45:00.5552662Z libgfortran-15.1.0 | h69a702a_2 34 KB conda-forge 2025-05-07T19:45:00.5553246Z libgfortran5-15.1.0 | hcea5267_2 1.5 MB conda-forge 2025-05-07T19:45:00.5553699Z libglib-2.84.0 | h2ff4ddf_0 3.8 MB conda-forge 2025-05-07T19:45:00.5554160Z libgrpc-1.71.0 | h8e591d7_1 7.6 MB conda-forge 2025-05-07T19:45:00.5554586Z libiconv-1.18 | h4ce23a2_1 696 KB conda-forge 2025-05-07T19:45:00.5555071Z libjpeg-turbo-3.1.0 | hb9d3cd8_0 614 KB conda-forge 2025-05-07T19:45:00.5555519Z liblzma-5.8.1 | hb9d3cd8_1 110 KB conda-forge 2025-05-07T19:45:00.5555992Z liblzma-devel-5.8.1 | hb9d3cd8_1 431 KB conda-forge 2025-05-07T19:45:00.5556481Z libnghttp2-1.64.0 | h161d5f1_0 632 KB conda-forge 2025-05-07T19:45:00.5557062Z libopenblas-0.3.29 |pthreads_h94d23a6_0 5.6 MB conda-forge 2025-05-07T19:45:00.5557559Z libpng-1.6.47 | h943b412_0 282 KB conda-forge 2025-05-07T19:45:00.5558004Z libprotobuf-5.29.3 | h501fc15_1 3.2 MB conda-forge 2025-05-07T19:45:00.5558488Z libre2-11-2024.07.02 | hba17884_3 205 KB conda-forge 2025-05-07T19:45:00.5558932Z libsqlite-3.49.2 | hee588c1_0 895 KB conda-forge 2025-05-07T19:45:00.5559395Z libssh2-1.11.1 | hcf80075_0 298 KB conda-forge 2025-05-07T19:45:00.5559844Z libtiff-4.7.0 | hd9ff511_4 419 KB conda-forge 2025-05-07T19:45:00.5560266Z libuuid-2.38.1 | h0b41bf4_0 33 KB conda-forge 2025-05-07T19:45:00.5560709Z libuv-1.50.0 | hb9d3cd8_0 870 KB conda-forge 2025-05-07T19:45:00.5561142Z libwebp-base-1.5.0 | h851e524_0 420 KB conda-forge 2025-05-07T19:45:00.5561601Z libxcb-1.17.0 | h8a09558_0 387 KB conda-forge 2025-05-07T19:45:00.5562049Z libzlib-1.3.1 | hb9d3cd8_2 60 KB conda-forge 2025-05-07T19:45:00.5562459Z make-4.4.1 | hb9d3cd8_2 501 KB conda-forge 2025-05-07T19:45:00.5562923Z markupsafe-3.0.2 | py313h8060acc_1 24 KB conda-forge 2025-05-07T19:45:00.5563364Z ncurses-6.5 | h2d0b736_3 871 KB conda-forge 2025-05-07T19:45:00.5563806Z ninja-1.12.1 | hff21bea_1 158 KB conda-forge 2025-05-07T19:45:00.5564256Z openblas-0.3.29 |pthreads_h6ec200e_0 5.8 MB conda-forge 2025-05-07T19:45:00.5564742Z openjdk-23.0.2 | h53dfc1b_2 181.4 MB conda-forge 2025-05-07T19:45:00.5565221Z packaging-25.0 | pyh29332c3_1 61 KB conda-forge 2025-05-07T19:45:00.5565666Z patchelf-0.18.0 | h3f2d84a_2 133 KB conda-forge 2025-05-07T19:45:00.5566112Z pcre2-10.44 | hc749103_2 934 KB conda-forge 2025-05-07T19:45:00.5566529Z pixman-0.46.0 | h29eaf8c_0 389 KB conda-forge 2025-05-07T19:45:00.5566977Z pthread-stubs-0.4 | hb9d3cd8_1002 8 KB conda-forge 2025-05-07T19:45:00.5567421Z pyelftools-0.32 | pyh707e725_1 146 KB conda-forge 2025-05-07T19:45:00.5567870Z python-3.13.2 |hf636f53_101_cp313 31.7 MB conda-forge 2025-05-07T19:45:00.5568305Z pyyaml-6.0.2 | py313h8060acc_2 201 KB conda-forge 2025-05-07T19:45:00.5568713Z re2-2024.07.02 | h9925aae_3 26 KB conda-forge 2025-05-07T19:45:00.5569124Z rhash-1.4.5 | hb9d3cd8_0 183 KB conda-forge 2025-05-07T19:45:00.5569550Z scikit-build-0.18.1 | pyhae55e72_2 114 KB conda-forge 2025-05-07T19:45:00.5570005Z singlejar-7.5.0 | h0e684df_1 122 KB conda-forge 2025-05-07T19:45:00.5570567Z sortedcontainers-2.4.0 | pyhd8ed1ab_1 28 KB conda-forge 2025-05-07T19:45:00.5571023Z sqlite-3.49.2 | h9eae976_0 840 KB conda-forge 2025-05-07T19:45:00.5571448Z tk-8.6.13 |noxft_h4845f30_101 3.2 MB conda-forge 2025-05-07T19:45:00.5571850Z tomli-2.2.1 | pyhd8ed1ab_1 19 KB conda-forge 2025-05-07T19:45:00.5572279Z wheel-0.45.1 | pyhd8ed1ab_1 61 KB conda-forge 2025-05-07T19:45:00.5572706Z xorg-libice-1.1.2 | hb9d3cd8_0 57 KB conda-forge 2025-05-07T19:45:00.5573165Z xorg-libsm-1.2.6 | he73a12e_0 27 KB conda-forge 2025-05-07T19:45:00.5573622Z xorg-libx11-1.8.12 | h4f16b4b_0 816 KB conda-forge 2025-05-07T19:45:00.5574134Z xorg-libxau-1.0.12 | hb9d3cd8_0 14 KB conda-forge 2025-05-07T19:45:00.5574610Z xorg-libxdmcp-1.1.5 | hb9d3cd8_0 19 KB conda-forge 2025-05-07T19:45:00.5575072Z xorg-libxext-1.3.6 | hb9d3cd8_0 49 KB conda-forge 2025-05-07T19:45:00.5575552Z xorg-libxfixes-6.0.1 | hb9d3cd8_0 19 KB conda-forge 2025-05-07T19:45:00.5576001Z xorg-libxi-1.8.2 | hb9d3cd8_0 46 KB conda-forge 2025-05-07T19:45:00.5576471Z xorg-libxrandr-1.5.4 | hb9d3cd8_0 29 KB conda-forge 2025-05-07T19:45:00.5576963Z xorg-libxrender-0.9.12 | hb9d3cd8_0 32 KB conda-forge 2025-05-07T19:45:00.5577417Z xorg-libxt-1.3.1 | hb9d3cd8_0 371 KB conda-forge 2025-05-07T19:45:00.5577880Z xorg-libxtst-1.2.5 | hb9d3cd8_3 32 KB conda-forge 2025-05-07T19:45:00.5578298Z xz-5.8.1 | hbcc6ac9_1 23 KB conda-forge 2025-05-07T19:45:00.5578727Z xz-gpl-tools-5.8.1 | hbcc6ac9_1 33 KB conda-forge 2025-05-07T19:45:00.5579183Z xz-tools-5.8.1 | hb9d3cd8_1 94 KB conda-forge 2025-05-07T19:45:00.5579698Z yaml-0.2.5 | h7f98852_2 87 KB conda-forge 2025-05-07T19:45:00.5580301Z zlib-1.3.1 | hb9d3cd8_2 90 KB conda-forge 2025-05-07T19:45:00.5580809Z zstd-1.5.7 | hb8e6e7a_2 554 KB conda-forge 2025-05-07T19:45:00.5581235Z ------------------------------------------------------------ 2025-05-07T19:45:00.5581598Z Total: 351.6 MB 2025-05-07T19:45:00.5581844Z 2025-05-07T19:45:00.5581981Z The following NEW packages will be INSTALLED: 2025-05-07T19:45:00.5582223Z 2025-05-07T19:45:00.5582457Z alsa-lib conda-forge/linux-64::alsa-lib-1.2.14-hb9d3cd8_0 2025-05-07T19:45:00.5582929Z attrs conda-forge/noarch::attrs-25.3.0-pyh71513ae_0 2025-05-07T19:45:00.5583430Z auditwheel conda-forge/noarch::auditwheel-6.2.0-pyha804496_1 2025-05-07T19:45:00.5583917Z bazel conda-forge/linux-64::bazel-7.5.0-h96810dc_2 2025-05-07T19:45:00.5584385Z c-ares conda-forge/linux-64::c-ares-1.34.5-hb9d3cd8_0 2025-05-07T19:45:00.5584840Z cairo conda-forge/linux-64::cairo-1.18.4-h3394656_0 2025-05-07T19:45:00.5603004Z click conda-forge/noarch::click-8.1.8-pyh707e725_0 2025-05-07T19:45:00.5603503Z cmake conda-forge/linux-64::cmake-4.0.2-h74e3db0_0 2025-05-07T19:45:00.5603986Z distro conda-forge/noarch::distro-1.9.0-pyhd8ed1ab_1 2025-05-07T19:45:00.5604526Z exceptiongroup conda-forge/noarch::exceptiongroup-1.2.2-pyhd8ed1ab_1 2025-05-07T19:45:00.5605192Z font-ttf-dejavu-s~ conda-forge/noarch::font-ttf-dejavu-sans-mono-2.37-hab24e00_0 2025-05-07T19:45:00.5605876Z font-ttf-inconsol~ conda-forge/noarch::font-ttf-inconsolata-3.000-h77eed37_0 2025-05-07T19:45:00.5606576Z font-ttf-source-c~ conda-forge/noarch::font-ttf-source-code-pro-2.038-h77eed37_0 2025-05-07T19:45:00.5607623Z font-ttf-ubuntu conda-forge/noarch::font-ttf-ubuntu-0.83-h77eed37_3 2025-05-07T19:45:00.5608159Z fontconfig conda-forge/linux-64::fontconfig-2.15.0-h7e30c49_1 2025-05-07T19:45:00.5608714Z fonts-conda-ecosy~ conda-forge/noarch::fonts-conda-ecosystem-1-0 2025-05-07T19:45:00.5609235Z fonts-conda-forge conda-forge/noarch::fonts-conda-forge-1-0 2025-05-07T19:45:00.5609750Z freetype conda-forge/linux-64::freetype-2.13.3-ha770c72_1 2025-05-07T19:45:00.5610234Z giflib conda-forge/linux-64::giflib-5.2.2-hd590300_0 2025-05-07T19:45:00.5610833Z graphite2 conda-forge/linux-64::graphite2-1.3.13-h59595ed_1003 2025-05-07T19:45:00.5611319Z harfbuzz conda-forge/linux-64::harfbuzz-11.0.0-h76408a6_0 2025-05-07T19:45:00.5611913Z hypothesis conda-forge/noarch::hypothesis-6.131.14-pyha770c72_0 2025-05-07T19:45:00.5612378Z icu conda-forge/linux-64::icu-75.1-he02047a_0 2025-05-07T19:45:00.5612770Z ijar conda-forge/linux-64::ijar-7.5.0-h5888daf_0 2025-05-07T19:45:00.5613197Z jinja2 conda-forge/noarch::jinja2-3.1.6-pyhd8ed1ab_0 2025-05-07T19:45:00.5613833Z keyutils conda-forge/linux-64::keyutils-1.6.1-h166bdaf_0 2025-05-07T19:45:00.5614273Z krb5 conda-forge/linux-64::krb5-1.21.3-h659f571_0 2025-05-07T19:45:00.5614708Z lcms2 conda-forge/linux-64::lcms2-2.17-h717163a_0 2025-05-07T19:45:00.5615125Z lerc conda-forge/linux-64::lerc-4.0.0-h0aef613_1 2025-05-07T19:45:00.5615818Z libabseil conda-forge/linux-64::libabseil-20250127.1-cxx17_hbbce691_0 2025-05-07T19:45:00.5616380Z libcups conda-forge/linux-64::libcups-2.3.3-h4637d8d_4 2025-05-07T19:45:00.5616913Z libcurl conda-forge/linux-64::libcurl-8.13.0-h332b0f4_0 2025-05-07T19:45:00.5617423Z libdeflate conda-forge/linux-64::libdeflate-1.23-h86f0d12_0 2025-05-07T19:45:00.5617953Z libedit conda-forge/linux-64::libedit-3.1.20250104-pl5321h7949ede_0 2025-05-07T19:45:00.5618472Z libev conda-forge/linux-64::libev-4.33-hd590300_2 2025-05-07T19:45:00.5618941Z libexpat conda-forge/linux-64::libexpat-2.7.0-h5888daf_0 2025-05-07T19:45:00.5619564Z libfreetype conda-forge/linux-64::libfreetype-2.13.3-ha770c72_1 2025-05-07T19:45:00.5620150Z libfreetype6 conda-forge/linux-64::libfreetype6-2.13.3-h48d6fc4_1 2025-05-07T19:45:00.5620740Z libgfortran conda-forge/linux-64::libgfortran-15.1.0-h69a702a_2 2025-05-07T19:45:00.5621300Z libgfortran5 conda-forge/linux-64::libgfortran5-15.1.0-hcea5267_2 2025-05-07T19:45:00.5621829Z libglib conda-forge/linux-64::libglib-2.84.0-h2ff4ddf_0 2025-05-07T19:45:00.5622300Z libgrpc conda-forge/linux-64::libgrpc-1.71.0-h8e591d7_1 2025-05-07T19:45:00.5622800Z libiconv conda-forge/linux-64::libiconv-1.18-h4ce23a2_1 2025-05-07T19:45:00.5623323Z libjpeg-turbo conda-forge/linux-64::libjpeg-turbo-3.1.0-hb9d3cd8_0 2025-05-07T19:45:00.5623865Z liblzma conda-forge/linux-64::liblzma-5.8.1-hb9d3cd8_1 2025-05-07T19:45:00.5624394Z liblzma-devel conda-forge/linux-64::liblzma-devel-5.8.1-hb9d3cd8_1 2025-05-07T19:45:00.5624933Z libnghttp2 conda-forge/linux-64::libnghttp2-1.64.0-h161d5f1_0 2025-05-07T19:45:00.5625524Z libopenblas conda-forge/linux-64::libopenblas-0.3.29-pthreads_h94d23a6_0 2025-05-07T19:45:00.5626064Z libpng conda-forge/linux-64::libpng-1.6.47-h943b412_0 2025-05-07T19:45:00.5626579Z libprotobuf conda-forge/linux-64::libprotobuf-5.29.3-h501fc15_1 2025-05-07T19:45:00.5627117Z libre2-11 conda-forge/linux-64::libre2-11-2024.07.02-hba17884_3 2025-05-07T19:45:00.5627622Z libsqlite conda-forge/linux-64::libsqlite-3.49.2-hee588c1_0 2025-05-07T19:45:00.5628132Z libssh2 conda-forge/linux-64::libssh2-1.11.1-hcf80075_0 2025-05-07T19:45:00.5628598Z libtiff conda-forge/linux-64::libtiff-4.7.0-hd9ff511_4 2025-05-07T19:45:00.5629180Z libuv conda-forge/linux-64::libuv-1.50.0-hb9d3cd8_0 2025-05-07T19:45:00.5629695Z libwebp-base conda-forge/linux-64::libwebp-base-1.5.0-h851e524_0 2025-05-07T19:45:00.5630193Z libxcb conda-forge/linux-64::libxcb-1.17.0-h8a09558_0 2025-05-07T19:45:00.5630674Z libzlib conda-forge/linux-64::libzlib-1.3.1-hb9d3cd8_2 2025-05-07T19:45:00.5631121Z make conda-forge/linux-64::make-4.4.1-hb9d3cd8_2 2025-05-07T19:45:00.5631634Z markupsafe conda-forge/linux-64::markupsafe-3.0.2-py313h8060acc_1 2025-05-07T19:45:00.5632137Z ninja conda-forge/linux-64::ninja-1.12.1-hff21bea_1 2025-05-07T19:45:00.5632661Z openblas conda-forge/linux-64::openblas-0.3.29-pthreads_h6ec200e_0 2025-05-07T19:45:00.5633285Z openjdk conda-forge/linux-64::openjdk-23.0.2-h53dfc1b_2 2025-05-07T19:45:00.5633777Z packaging conda-forge/noarch::packaging-25.0-pyh29332c3_1 2025-05-07T19:45:00.5634298Z patchelf conda-forge/linux-64::patchelf-0.18.0-h3f2d84a_2 2025-05-07T19:45:00.5634763Z pcre2 conda-forge/linux-64::pcre2-10.44-hc749103_2 2025-05-07T19:45:00.5635235Z pixman conda-forge/linux-64::pixman-0.46.0-h29eaf8c_0 2025-05-07T19:45:00.5635775Z pthread-stubs conda-forge/linux-64::pthread-stubs-0.4-hb9d3cd8_1002 2025-05-07T19:45:00.5636324Z pyelftools conda-forge/noarch::pyelftools-0.32-pyh707e725_1 2025-05-07T19:45:00.5636843Z pyyaml conda-forge/linux-64::pyyaml-6.0.2-py313h8060acc_2 2025-05-07T19:45:00.5637299Z re2 conda-forge/linux-64::re2-2024.07.02-h9925aae_3 2025-05-07T19:45:00.5637755Z rhash conda-forge/linux-64::rhash-1.4.5-hb9d3cd8_0 2025-05-07T19:45:00.5638280Z scikit-build conda-forge/noarch::scikit-build-0.18.1-pyhae55e72_2 2025-05-07T19:45:00.5638806Z singlejar conda-forge/linux-64::singlejar-7.5.0-h0e684df_1 2025-05-07T19:45:00.5639401Z sortedcontainers conda-forge/noarch::sortedcontainers-2.4.0-pyhd8ed1ab_1 2025-05-07T19:45:00.5639947Z tomli conda-forge/noarch::tomli-2.2.1-pyhd8ed1ab_1 2025-05-07T19:45:00.5640453Z xorg-libice conda-forge/linux-64::xorg-libice-1.1.2-hb9d3cd8_0 2025-05-07T19:45:00.5640993Z xorg-libsm conda-forge/linux-64::xorg-libsm-1.2.6-he73a12e_0 2025-05-07T19:45:00.5641509Z xorg-libx11 conda-forge/linux-64::xorg-libx11-1.8.12-h4f16b4b_0 2025-05-07T19:45:00.5642050Z xorg-libxau conda-forge/linux-64::xorg-libxau-1.0.12-hb9d3cd8_0 2025-05-07T19:45:00.5642597Z xorg-libxdmcp conda-forge/linux-64::xorg-libxdmcp-1.1.5-hb9d3cd8_0 2025-05-07T19:45:00.5643169Z xorg-libxext conda-forge/linux-64::xorg-libxext-1.3.6-hb9d3cd8_0 2025-05-07T19:45:00.5643754Z xorg-libxfixes conda-forge/linux-64::xorg-libxfixes-6.0.1-hb9d3cd8_0 2025-05-07T19:45:00.5644296Z xorg-libxi conda-forge/linux-64::xorg-libxi-1.8.2-hb9d3cd8_0 2025-05-07T19:45:00.5644853Z xorg-libxrandr conda-forge/linux-64::xorg-libxrandr-1.5.4-hb9d3cd8_0 2025-05-07T19:45:00.5645440Z xorg-libxrender conda-forge/linux-64::xorg-libxrender-0.9.12-hb9d3cd8_0 2025-05-07T19:45:00.5646009Z xorg-libxt conda-forge/linux-64::xorg-libxt-1.3.1-hb9d3cd8_0 2025-05-07T19:45:00.5646549Z xorg-libxtst conda-forge/linux-64::xorg-libxtst-1.2.5-hb9d3cd8_3 2025-05-07T19:45:00.5647083Z xz-gpl-tools conda-forge/linux-64::xz-gpl-tools-5.8.1-hbcc6ac9_1 2025-05-07T19:45:00.5647607Z xz-tools conda-forge/linux-64::xz-tools-5.8.1-hb9d3cd8_1 2025-05-07T19:45:00.5648054Z yaml conda-forge/linux-64::yaml-0.2.5-h7f98852_2 2025-05-07T19:45:00.5648494Z zstd conda-forge/linux-64::zstd-1.5.7-hb8e6e7a_2 2025-05-07T19:45:00.5648761Z 2025-05-07T19:45:00.5648906Z The following packages will be UPDATED: 2025-05-07T19:45:00.5649135Z 2025-05-07T19:45:00.5649438Z libuuid pkgs/main::libuuid-1.41.5-h5eee18b_0 --> conda-forge::libuuid-2.38.1-h0b41bf4_0 2025-05-07T19:45:00.5650219Z ncurses pkgs/main::ncurses-6.4-h6a678d5_0 --> conda-forge::ncurses-6.5-h2d0b736_3 2025-05-07T19:45:00.5650928Z python pkgs/main::python-3.13.2-hf623796_100~ --> conda-forge::python-3.13.2-hf636f53_101_cp313 2025-05-07T19:45:00.5651646Z sqlite pkgs/main::sqlite-3.45.3-h5eee18b_0 --> conda-forge::sqlite-3.49.2-h9eae976_0 2025-05-07T19:45:00.5652366Z wheel pkgs/main/linux-64::wheel-0.45.1-py31~ --> conda-forge/noarch::wheel-0.45.1-pyhd8ed1ab_1 2025-05-07T19:45:00.5653012Z xz pkgs/main::xz-5.6.4-h5eee18b_1 --> conda-forge::xz-5.8.1-hbcc6ac9_1 2025-05-07T19:45:00.5653618Z zlib pkgs/main::zlib-1.2.13-h5eee18b_1 --> conda-forge::zlib-1.3.1-hb9d3cd8_2 2025-05-07T19:45:00.5653984Z 2025-05-07T19:45:00.5654316Z The following packages will be SUPERSEDED by a higher-priority channel: 2025-05-07T19:45:00.5654668Z 2025-05-07T19:45:00.5654932Z expat pkgs/main::expat-2.7.1-h6a678d5_0 --> conda-forge::expat-2.7.0-h5888daf_0 2025-05-07T19:45:00.5655559Z tk pkgs/main::tk-8.6.14-h39e8969_0 --> conda-forge::tk-8.6.13-noxft_h4845f30_101 2025-05-07T19:45:00.5655915Z 2025-05-07T19:45:00.5655968Z 2025-05-07T19:45:00.5655972Z 2025-05-07T19:45:00.5656150Z Downloading and Extracting Packages: ...working... 2025-05-07T19:45:00.5656555Z openjdk-23.0.2 | 181.4 MB | | 0% 2025-05-07T19:45:00.5656824Z 2025-05-07T19:45:00.5657164Z bazel-7.5.0 | 47.4 MB | | 0%  2025-05-07T19:45:00.5657410Z 2025-05-07T19:45:00.5657413Z 2025-05-07T19:45:00.5657654Z python-3.13.2 | 31.7 MB | | 0%  2025-05-07T19:45:00.5657912Z 2025-05-07T19:45:00.5657917Z 2025-05-07T19:45:00.5657920Z 2025-05-07T19:45:00.5658143Z cmake-4.0.2 | 19.4 MB | | 0%  2025-05-07T19:45:00.5658421Z 2025-05-07T19:45:00.5658425Z 2025-05-07T19:45:00.5658432Z 2025-05-07T19:45:00.5658435Z 2025-05-07T19:45:00.5658647Z icu-75.1 | 11.6 MB | | 0%  2025-05-07T19:45:00.5658899Z 2025-05-07T19:45:00.5658902Z 2025-05-07T19:45:00.5658906Z 2025-05-07T19:45:00.5658910Z 2025-05-07T19:45:00.5658913Z 2025-05-07T19:45:00.5659187Z libgrpc-1.71.0 | 7.6 MB | | 0%  2025-05-07T19:45:00.5659577Z 2025-05-07T19:45:00.5659582Z 2025-05-07T19:45:00.5659587Z 2025-05-07T19:45:00.5659592Z 2025-05-07T19:45:00.5659595Z 2025-05-07T19:45:00.5659598Z 2025-05-07T19:45:00.5659876Z openblas-0.3.29 | 5.8 MB | | 0%  2025-05-07T19:45:00.5660165Z 2025-05-07T19:45:00.5660169Z 2025-05-07T19:45:00.5660172Z 2025-05-07T19:45:00.5660176Z 2025-05-07T19:45:00.5660179Z 2025-05-07T19:45:00.5660182Z 2025-05-07T19:45:00.5660186Z 2025-05-07T19:45:00.5660494Z libopenblas-0.3.29 | 5.6 MB | | 0%  2025-05-07T19:45:00.5660812Z 2025-05-07T19:45:00.5660821Z 2025-05-07T19:45:00.5660825Z 2025-05-07T19:45:00.5660828Z 2025-05-07T19:45:00.5660832Z 2025-05-07T19:45:00.5660836Z 2025-05-07T19:45:00.5660839Z 2025-05-07T19:45:00.5660843Z 2025-05-07T19:45:00.5661089Z libcups-2.3.3 | 4.3 MB | | 0%  2025-05-07T19:45:00.5661389Z 2025-05-07T19:45:00.5661392Z 2025-05-07T19:45:00.5661395Z 2025-05-07T19:45:00.5661399Z 2025-05-07T19:45:00.5661402Z 2025-05-07T19:45:00.5661405Z 2025-05-07T19:45:00.5661409Z 2025-05-07T19:45:00.5661412Z 2025-05-07T19:45:00.5661416Z 2025-05-07T19:45:00.5661726Z libglib-2.84.0 | 3.8 MB | | 0%  2025-05-07T19:45:00.5662015Z 2025-05-07T19:45:00.5662018Z 2025-05-07T19:45:00.5662022Z 2025-05-07T19:45:00.5662025Z 2025-05-07T19:45:00.5662029Z 2025-05-07T19:45:00.5662032Z 2025-05-07T19:45:00.5662035Z 2025-05-07T19:45:00.5662042Z 2025-05-07T19:45:00.5662045Z 2025-05-07T19:45:00.5662049Z 2025-05-07T19:45:00.5662343Z libprotobuf-5.29.3 | 3.2 MB | | 0%  2025-05-07T19:45:00.5662737Z 2025-05-07T19:45:00.5662741Z 2025-05-07T19:45:00.5662744Z 2025-05-07T19:45:00.5662747Z 2025-05-07T19:45:00.5662751Z 2025-05-07T19:45:00.5662754Z 2025-05-07T19:45:00.5662757Z 2025-05-07T19:45:00.5662761Z 2025-05-07T19:45:00.5662764Z 2025-05-07T19:45:00.5662767Z 2025-05-07T19:45:00.5662770Z 2025-05-07T19:45:00.5663027Z tk-8.6.13 | 3.2 MB | | 0%  2025-05-07T19:45:00.5663298Z 2025-05-07T19:45:00.5663302Z 2025-05-07T19:45:00.5663305Z 2025-05-07T19:45:00.5663309Z 2025-05-07T19:45:00.5663312Z 2025-05-07T19:45:00.5663315Z 2025-05-07T19:45:00.5663318Z 2025-05-07T19:45:00.5663322Z 2025-05-07T19:45:00.5663325Z 2025-05-07T19:45:00.5663328Z 2025-05-07T19:45:00.5663332Z 2025-05-07T19:45:00.5663336Z 2025-05-07T19:45:00.5663695Z harfbuzz-11.0.0 | 1.6 MB | | 0%  2025-05-07T19:45:00.5663998Z 2025-05-07T19:45:00.5664006Z 2025-05-07T19:45:00.5664009Z 2025-05-07T19:45:00.5664012Z 2025-05-07T19:45:00.5664016Z 2025-05-07T19:45:00.5664019Z 2025-05-07T19:45:00.5664022Z 2025-05-07T19:45:00.5664026Z 2025-05-07T19:45:00.5664030Z 2025-05-07T19:45:00.5664034Z 2025-05-07T19:45:00.5664057Z 2025-05-07T19:45:00.5664061Z 2025-05-07T19:45:00.5664064Z 2025-05-07T19:45:00.5664361Z font-ttf-ubuntu-0.83 | 1.5 MB | | 0%  2025-05-07T19:45:00.5664688Z 2025-05-07T19:45:00.5664692Z 2025-05-07T19:45:00.5664695Z 2025-05-07T19:45:00.5664699Z 2025-05-07T19:45:00.5664702Z 2025-05-07T19:45:00.5664706Z 2025-05-07T19:45:00.5664709Z 2025-05-07T19:45:00.5664728Z 2025-05-07T19:45:00.5664732Z 2025-05-07T19:45:00.5664735Z 2025-05-07T19:45:00.5664738Z 2025-05-07T19:45:00.5664741Z 2025-05-07T19:45:00.5664745Z 2025-05-07T19:45:00.5664748Z 2025-05-07T19:45:00.5665076Z libgfortran5-15.1.0 | 1.5 MB | | 0%  2025-05-07T19:45:00.5665435Z 2025-05-07T19:45:00.5665438Z 2025-05-07T19:45:00.5665442Z 2025-05-07T19:45:00.5665445Z 2025-05-07T19:45:00.5665448Z 2025-05-07T19:45:00.5665452Z 2025-05-07T19:45:00.5665455Z 2025-05-07T19:45:00.5665458Z 2025-05-07T19:45:00.5665462Z 2025-05-07T19:45:00.5665465Z 2025-05-07T19:45:00.5665468Z 2025-05-07T19:45:00.5665472Z 2025-05-07T19:45:00.5665475Z 2025-05-07T19:45:00.5665479Z 2025-05-07T19:45:00.5665482Z 2025-05-07T19:45:00.5665749Z krb5-1.21.3 | 1.3 MB | | 0%  2025-05-07T19:45:00.5666062Z 2025-05-07T19:45:00.5666065Z 2025-05-07T19:45:00.5666069Z 2025-05-07T19:45:00.5666073Z 2025-05-07T19:45:00.5666076Z 2025-05-07T19:45:00.5666080Z 2025-05-07T19:45:00.5666083Z 2025-05-07T19:45:00.5666086Z 2025-05-07T19:45:00.5666090Z 2025-05-07T19:45:00.5666093Z 2025-05-07T19:45:00.5666096Z 2025-05-07T19:45:00.5666103Z 2025-05-07T19:45:00.5666107Z 2025-05-07T19:45:00.5666110Z 2025-05-07T19:45:00.5666118Z 2025-05-07T19:45:00.5666121Z 2025-05-07T19:45:00.5666466Z libabseil-20250127.1 | 1.3 MB | | 0%  2025-05-07T19:45:00.5666814Z 2025-05-07T19:45:00.5666817Z 2025-05-07T19:45:00.5666820Z 2025-05-07T19:45:00.5666824Z 2025-05-07T19:45:00.5666828Z 2025-05-07T19:45:00.5666831Z 2025-05-07T19:45:00.5666835Z 2025-05-07T19:45:00.5666838Z 2025-05-07T19:45:00.5666842Z 2025-05-07T19:45:00.5666845Z 2025-05-07T19:45:00.5666849Z 2025-05-07T19:45:00.5666870Z 2025-05-07T19:45:00.5666873Z 2025-05-07T19:45:00.5666876Z 2025-05-07T19:45:00.5666880Z 2025-05-07T19:45:00.5666883Z 2025-05-07T19:45:00.5666886Z 2025-05-07T19:45:00.5667194Z cairo-1.18.4 | 955 KB | | 0%  2025-05-07T19:45:00.5667523Z 2025-05-07T19:45:00.5667527Z 2025-05-07T19:45:00.5667530Z 2025-05-07T19:45:00.5667537Z 2025-05-07T19:45:00.5667541Z 2025-05-07T19:45:00.5667544Z 2025-05-07T19:45:00.5667548Z 2025-05-07T19:45:00.5667641Z 2025-05-07T19:45:00.5667644Z 2025-05-07T19:45:00.5667648Z 2025-05-07T19:45:00.5667651Z 2025-05-07T19:45:00.5667655Z 2025-05-07T19:45:00.5667659Z 2025-05-07T19:45:00.5667662Z 2025-05-07T19:45:00.5667665Z 2025-05-07T19:45:00.5667669Z 2025-05-07T19:45:00.5667672Z 2025-05-07T19:45:00.5667675Z 2025-05-07T19:45:00.5667991Z pcre2-10.44 | 934 KB | | 0%  2025-05-07T19:45:00.5668304Z 2025-05-07T19:45:00.5668308Z 2025-05-07T19:45:00.5668311Z 2025-05-07T19:45:00.5668315Z 2025-05-07T19:45:00.5668318Z 2025-05-07T19:45:00.5668321Z 2025-05-07T19:45:00.5668325Z 2025-05-07T19:45:00.5668328Z 2025-05-07T19:45:00.5668331Z 2025-05-07T19:45:00.5668334Z 2025-05-07T19:45:00.5668338Z 2025-05-07T19:45:00.5668341Z 2025-05-07T19:45:00.5668345Z 2025-05-07T19:45:00.5668348Z 2025-05-07T19:45:00.5668414Z 2025-05-07T19:45:00.5668435Z 2025-05-07T19:45:00.5668439Z 2025-05-07T19:45:00.5668442Z 2025-05-07T19:45:00.5668448Z 2025-05-07T19:45:00.6765544Z ... (more hidden) ... 2025-05-07T19:45:00.6766469Z 2025-05-07T19:45:00.6766500Z 2025-05-07T19:45:00.6766511Z 2025-05-07T19:45:00.6817860Z 2025-05-07T19:45:00.8177866Z icu-75.1 | 11.6 MB | 1 | 1%  2025-05-07T19:45:00.8178671Z 2025-05-07T19:45:00.8178684Z 2025-05-07T19:45:00.8178695Z 2025-05-07T19:45:00.8178705Z 2025-05-07T19:45:00.9143749Z icu-75.1 | 11.6 MB | 2 | 2%  2025-05-07T19:45:00.9241944Z openjdk-23.0.2 | 181.4 MB | | 0% 2025-05-07T19:45:00.9242267Z 2025-05-07T19:45:00.9242616Z 2025-05-07T19:45:00.9242630Z 2025-05-07T19:45:00.9242640Z 2025-05-07T19:45:00.9345427Z icu-75.1 | 11.6 MB | #9 | 20%  2025-05-07T19:45:00.9345776Z 2025-05-07T19:45:00.9382391Z bazel-7.5.0 | 47.4 MB | | 0%  2025-05-07T19:45:00.9383712Z 2025-05-07T19:45:00.9383726Z 2025-05-07T19:45:00.9383758Z 2025-05-07T19:45:00.9729430Z cmake-4.0.2 | 19.4 MB | | 0%  2025-05-07T19:45:00.9729877Z 2025-05-07T19:45:00.9729882Z 2025-05-07T19:45:01.0145688Z python-3.13.2 | 31.7 MB | | 0%  2025-05-07T19:45:01.0244361Z openjdk-23.0.2 | 181.4 MB | 5 | 5% 2025-05-07T19:45:01.0244847Z 2025-05-07T19:45:01.0244857Z 2025-05-07T19:45:01.0244901Z 2025-05-07T19:45:01.0346876Z 2025-05-07T19:45:01.0347339Z icu-75.1 | 11.6 MB | ###7 | 38%  2025-05-07T19:45:01.0347622Z 2025-05-07T19:45:01.0382983Z bazel-7.5.0 | 47.4 MB | 8 | 9%  2025-05-07T19:45:01.0383762Z 2025-05-07T19:45:01.0383775Z 2025-05-07T19:45:01.0385218Z 2025-05-07T19:45:01.0728441Z cmake-4.0.2 | 19.4 MB | ##6 | 27%  2025-05-07T19:45:01.0728750Z 2025-05-07T19:45:01.0728773Z 2025-05-07T19:45:01.1300063Z python-3.13.2 | 31.7 MB | #6 | 16%  2025-05-07T19:45:01.1331260Z openjdk-23.0.2 | 181.4 MB | 8 | 8% 2025-05-07T19:45:01.1331561Z 2025-05-07T19:45:01.1331566Z 2025-05-07T19:45:01.1331569Z 2025-05-07T19:45:01.1331573Z 2025-05-07T19:45:01.1348947Z icu-75.1 | 11.6 MB | #######4 | 75%  2025-05-07T19:45:01.1349280Z 2025-05-07T19:45:01.1381333Z bazel-7.5.0 | 47.4 MB | #8 | 19%  2025-05-07T19:45:01.1381603Z 2025-05-07T19:45:01.1381608Z 2025-05-07T19:45:01.1381611Z 2025-05-07T19:45:01.1729936Z cmake-4.0.2 | 19.4 MB | #####2 | 52%  2025-05-07T19:45:01.1730261Z 2025-05-07T19:45:01.1730272Z 2025-05-07T19:45:01.2302140Z python-3.13.2 | 31.7 MB | ###3 | 33%  2025-05-07T19:45:01.2352023Z openjdk-23.0.2 | 181.4 MB | #1 | 12% 2025-05-07T19:45:01.2352515Z 2025-05-07T19:45:01.2382370Z bazel-7.5.0 | 47.4 MB | ###1 | 32%  2025-05-07T19:45:01.2382732Z 2025-05-07T19:45:01.2382737Z 2025-05-07T19:45:01.2382741Z 2025-05-07T19:45:01.2734090Z cmake-4.0.2 | 19.4 MB | ########4 | 85%  2025-05-07T19:45:01.2734484Z 2025-05-07T19:45:01.2734489Z 2025-05-07T19:45:01.3302083Z python-3.13.2 | 31.7 MB | #####6 | 57%  2025-05-07T19:45:01.3355148Z openjdk-23.0.2 | 181.4 MB | #6 | 17% 2025-05-07T19:45:01.3355663Z 2025-05-07T19:45:01.3763315Z bazel-7.5.0 | 47.4 MB | ####7 | 47%  2025-05-07T19:45:01.3763744Z 2025-05-07T19:45:01.3763748Z 2025-05-07T19:45:01.4302306Z python-3.13.2 | 31.7 MB | #######7 | 77%  2025-05-07T19:45:01.4372447Z openjdk-23.0.2 | 181.4 MB | ##2 | 23% 2025-05-07T19:45:01.4372952Z 2025-05-07T19:45:01.4465396Z bazel-7.5.0 | 47.4 MB | ######5 | 65%  2025-05-07T19:45:01.4465801Z 2025-05-07T19:45:01.4465807Z 2025-05-07T19:45:01.4465812Z 2025-05-07T19:45:01.4936188Z cmake-4.0.2 | 19.4 MB | ########## | 100%  2025-05-07T19:45:01.4936525Z 2025-05-07T19:45:01.4936543Z 2025-05-07T19:45:01.4936546Z 2025-05-07T19:45:01.4936550Z 2025-05-07T19:45:01.4936553Z 2025-05-07T19:45:01.5223080Z libgrpc-1.71.0 | 7.6 MB | | 0%  2025-05-07T19:45:01.5223505Z 2025-05-07T19:45:01.5223510Z 2025-05-07T19:45:01.5223513Z 2025-05-07T19:45:01.5223516Z 2025-05-07T19:45:01.5371532Z icu-75.1 | 11.6 MB | #########4 | 94%  2025-05-07T19:45:01.5372189Z openjdk-23.0.2 | 181.4 MB | ##7 | 28% 2025-05-07T19:45:01.5372658Z 2025-05-07T19:45:01.5936919Z bazel-7.5.0 | 47.4 MB | ########2 | 82%  2025-05-07T19:45:01.5937342Z 2025-05-07T19:45:01.5937347Z 2025-05-07T19:45:01.5937352Z 2025-05-07T19:45:01.5937356Z 2025-05-07T19:45:01.5937359Z 2025-05-07T19:45:01.6445408Z libgrpc-1.71.0 | 7.6 MB | ######3 | 64%  2025-05-07T19:45:01.6445843Z 2025-05-07T19:45:01.6625320Z bazel-7.5.0 | 47.4 MB | #########7 | 97%  2025-05-07T19:45:01.7023457Z openjdk-23.0.2 | 181.4 MB | ###2 | 32% 2025-05-07T19:45:01.7023778Z 2025-05-07T19:45:01.7023783Z 2025-05-07T19:45:01.7023786Z 2025-05-07T19:45:01.7026180Z 2025-05-07T19:45:01.7254839Z icu-75.1 | 11.6 MB | ########## | 100%  2025-05-07T19:45:01.7255199Z 2025-05-07T19:45:01.7255205Z 2025-05-07T19:45:01.7255210Z 2025-05-07T19:45:01.7255215Z 2025-05-07T19:45:01.7255221Z 2025-05-07T19:45:01.7625634Z libgrpc-1.71.0 | 7.6 MB | ########## | 100%  2025-05-07T19:45:01.7681054Z openjdk-23.0.2 | 181.4 MB | ###7 | 38% 2025-05-07T19:45:01.7681648Z 2025-05-07T19:45:01.7681654Z 2025-05-07T19:45:01.7681659Z 2025-05-07T19:45:01.7681664Z 2025-05-07T19:45:01.7681670Z 2025-05-07T19:45:01.7681675Z 2025-05-07T19:45:01.7681681Z 2025-05-07T19:45:01.8068638Z libopenblas-0.3.29 | 5.6 MB | | 0%  2025-05-07T19:45:01.8069014Z 2025-05-07T19:45:01.8069019Z 2025-05-07T19:45:01.8069023Z 2025-05-07T19:45:01.8069026Z 2025-05-07T19:45:01.8069038Z 2025-05-07T19:45:01.8069042Z 2025-05-07T19:45:01.8892644Z openblas-0.3.29 | 5.8 MB | | 0%  2025-05-07T19:45:01.9069060Z openjdk-23.0.2 | 181.4 MB | ####2 | 43% 2025-05-07T19:45:01.9069365Z 2025-05-07T19:45:01.9069370Z 2025-05-07T19:45:01.9069373Z 2025-05-07T19:45:01.9069377Z 2025-05-07T19:45:01.9069381Z 2025-05-07T19:45:01.9069384Z 2025-05-07T19:45:01.9089620Z openblas-0.3.29 | 5.8 MB | #####9 | 60%  2025-05-07T19:45:01.9089980Z 2025-05-07T19:45:01.9092505Z 2025-05-07T19:45:01.9093909Z python-3.13.2 | 31.7 MB | ########## | 100%  2025-05-07T19:45:01.9094196Z 2025-05-07T19:45:01.9094617Z 2025-05-07T19:45:01.9486828Z python-3.13.2 | 31.7 MB | ########## | 100%  2025-05-07T19:45:01.9487117Z 2025-05-07T19:45:01.9487290Z 2025-05-07T19:45:01.9487328Z 2025-05-07T19:45:01.9487354Z 2025-05-07T19:45:01.9487358Z 2025-05-07T19:45:01.9487361Z 2025-05-07T19:45:01.9487365Z 2025-05-07T19:45:01.9488021Z libopenblas-0.3.29 | 5.6 MB | ########## | 100%  2025-05-07T19:45:01.9488347Z 2025-05-07T19:45:01.9488359Z 2025-05-07T19:45:01.9488363Z 2025-05-07T19:45:01.9488366Z 2025-05-07T19:45:01.9488382Z 2025-05-07T19:45:01.9488386Z 2025-05-07T19:45:01.9488389Z 2025-05-07T19:45:01.9604120Z libopenblas-0.3.29 | 5.6 MB | ########## | 100%  2025-05-07T19:45:01.9604585Z 2025-05-07T19:45:01.9604705Z 2025-05-07T19:45:01.9604709Z 2025-05-07T19:45:01.9604736Z 2025-05-07T19:45:01.9604740Z 2025-05-07T19:45:01.9604756Z 2025-05-07T19:45:01.9604760Z 2025-05-07T19:45:01.9604763Z 2025-05-07T19:45:01.9887625Z libcups-2.3.3 | 4.3 MB | | 0%  2025-05-07T19:45:01.9888344Z 2025-05-07T19:45:01.9888349Z 2025-05-07T19:45:01.9888468Z 2025-05-07T19:45:01.9888473Z 2025-05-07T19:45:01.9888845Z 2025-05-07T19:45:01.9888851Z 2025-05-07T19:45:01.9888855Z 2025-05-07T19:45:01.9888858Z 2025-05-07T19:45:01.9888871Z 2025-05-07T19:45:02.0262993Z libglib-2.84.0 | 3.8 MB | | 0%  2025-05-07T19:45:02.0858129Z openjdk-23.0.2 | 181.4 MB | ####7 | 47% 2025-05-07T19:45:02.0859033Z 2025-05-07T19:45:02.0859049Z 2025-05-07T19:45:02.0859060Z 2025-05-07T19:45:02.0859071Z 2025-05-07T19:45:02.0859082Z 2025-05-07T19:45:02.0859093Z 2025-05-07T19:45:02.0860304Z openblas-0.3.29 | 5.8 MB | ########## | 100%  2025-05-07T19:45:02.0861203Z 2025-05-07T19:45:02.0861216Z 2025-05-07T19:45:02.0861227Z 2025-05-07T19:45:02.0861376Z 2025-05-07T19:45:02.0861379Z 2025-05-07T19:45:02.0861384Z 2025-05-07T19:45:02.1119882Z openblas-0.3.29 | 5.8 MB | ########## | 100%  2025-05-07T19:45:02.1120311Z 2025-05-07T19:45:02.1120319Z 2025-05-07T19:45:02.1120328Z 2025-05-07T19:45:02.1120338Z 2025-05-07T19:45:02.1120399Z 2025-05-07T19:45:02.1120404Z 2025-05-07T19:45:02.1120411Z 2025-05-07T19:45:02.1120419Z 2025-05-07T19:45:02.1120949Z libcups-2.3.3 | 4.3 MB | ########## | 100%  2025-05-07T19:45:02.1121287Z 2025-05-07T19:45:02.1121315Z 2025-05-07T19:45:02.1121318Z 2025-05-07T19:45:02.1121322Z 2025-05-07T19:45:02.1121325Z 2025-05-07T19:45:02.1121329Z 2025-05-07T19:45:02.1121332Z 2025-05-07T19:45:02.1121335Z 2025-05-07T19:45:02.1158833Z libcups-2.3.3 | 4.3 MB | ########## | 100%  2025-05-07T19:45:02.1159237Z 2025-05-07T19:45:02.1159265Z 2025-05-07T19:45:02.1159269Z 2025-05-07T19:45:02.1159273Z 2025-05-07T19:45:02.1159276Z 2025-05-07T19:45:02.1159279Z 2025-05-07T19:45:02.1159283Z 2025-05-07T19:45:02.1159286Z 2025-05-07T19:45:02.1159289Z 2025-05-07T19:45:02.1159560Z libglib-2.84.0 | 3.8 MB | ########## | 100%  2025-05-07T19:45:02.1159857Z 2025-05-07T19:45:02.1159861Z 2025-05-07T19:45:02.1159885Z 2025-05-07T19:45:02.1159904Z 2025-05-07T19:45:02.1159907Z 2025-05-07T19:45:02.1159911Z 2025-05-07T19:45:02.1159914Z 2025-05-07T19:45:02.1159924Z 2025-05-07T19:45:02.1159928Z 2025-05-07T19:45:02.1262466Z libglib-2.84.0 | 3.8 MB | ########## | 100%  2025-05-07T19:45:02.1506804Z openjdk-23.0.2 | 181.4 MB | #####1 | 51% 2025-05-07T19:45:02.1507339Z 2025-05-07T19:45:02.1507346Z 2025-05-07T19:45:02.1507349Z 2025-05-07T19:45:02.1507353Z 2025-05-07T19:45:02.1507356Z 2025-05-07T19:45:02.1507361Z 2025-05-07T19:45:02.1507365Z 2025-05-07T19:45:02.1507371Z 2025-05-07T19:45:02.1507377Z 2025-05-07T19:45:02.1507405Z 2025-05-07T19:45:02.1507409Z 2025-05-07T19:45:02.1582954Z tk-8.6.13 | 3.2 MB | | 0%  2025-05-07T19:45:02.1583367Z 2025-05-07T19:45:02.1583372Z 2025-05-07T19:45:02.1583377Z 2025-05-07T19:45:02.1583382Z 2025-05-07T19:45:02.1583387Z 2025-05-07T19:45:02.1583416Z 2025-05-07T19:45:02.1583421Z 2025-05-07T19:45:02.1583450Z 2025-05-07T19:45:02.1583454Z 2025-05-07T19:45:02.1583458Z 2025-05-07T19:45:02.1583691Z 2025-05-07T19:45:02.1583694Z 2025-05-07T19:45:02.1598579Z harfbuzz-11.0.0 | 1.6 MB | | 1%  2025-05-07T19:45:02.1599080Z 2025-05-07T19:45:02.1599085Z 2025-05-07T19:45:02.1599089Z 2025-05-07T19:45:02.1599093Z 2025-05-07T19:45:02.1599096Z 2025-05-07T19:45:02.1599099Z 2025-05-07T19:45:02.1599103Z 2025-05-07T19:45:02.1599106Z 2025-05-07T19:45:02.1599110Z 2025-05-07T19:45:02.1599113Z 2025-05-07T19:45:02.2287802Z libprotobuf-5.29.3 | 3.2 MB | | 0%  2025-05-07T19:45:02.2288207Z 2025-05-07T19:45:02.2288212Z 2025-05-07T19:45:02.2288217Z 2025-05-07T19:45:02.2288223Z 2025-05-07T19:45:02.2288227Z 2025-05-07T19:45:02.2288231Z 2025-05-07T19:45:02.2288236Z 2025-05-07T19:45:02.2288241Z 2025-05-07T19:45:02.2288246Z 2025-05-07T19:45:02.2288249Z 2025-05-07T19:45:02.2288252Z 2025-05-07T19:45:02.2288497Z 2025-05-07T19:45:02.2373866Z harfbuzz-11.0.0 | 1.6 MB | ########## | 100%  2025-05-07T19:45:02.2656696Z openjdk-23.0.2 | 181.4 MB | #####5 | 55% 2025-05-07T19:45:02.2657212Z 2025-05-07T19:45:02.2657219Z 2025-05-07T19:45:02.2657225Z 2025-05-07T19:45:02.2657257Z 2025-05-07T19:45:02.2657262Z 2025-05-07T19:45:02.2657267Z 2025-05-07T19:45:02.2657272Z 2025-05-07T19:45:02.2657278Z 2025-05-07T19:45:02.2657285Z 2025-05-07T19:45:02.2657289Z 2025-05-07T19:45:02.2657294Z 2025-05-07T19:45:02.2657615Z tk-8.6.13 | 3.2 MB | ########## | 100%  2025-05-07T19:45:02.2657903Z 2025-05-07T19:45:02.2657930Z 2025-05-07T19:45:02.2657934Z 2025-05-07T19:45:02.2657937Z 2025-05-07T19:45:02.2657941Z 2025-05-07T19:45:02.2657944Z 2025-05-07T19:45:02.2657947Z 2025-05-07T19:45:02.2657950Z 2025-05-07T19:45:02.2657954Z 2025-05-07T19:45:02.2657957Z 2025-05-07T19:45:02.2657960Z 2025-05-07T19:45:02.2671088Z tk-8.6.13 | 3.2 MB | ########## | 100%  2025-05-07T19:45:02.2671596Z 2025-05-07T19:45:02.2671608Z 2025-05-07T19:45:02.2671622Z 2025-05-07T19:45:02.2671626Z 2025-05-07T19:45:02.2671629Z 2025-05-07T19:45:02.2671632Z 2025-05-07T19:45:02.2671636Z 2025-05-07T19:45:02.2671639Z 2025-05-07T19:45:02.2671643Z 2025-05-07T19:45:02.2671646Z 2025-05-07T19:45:02.2671650Z 2025-05-07T19:45:02.2671653Z 2025-05-07T19:45:02.2674963Z 2025-05-07T19:45:02.2839719Z font-ttf-ubuntu-0.83 | 1.5 MB | 1 | 1%  2025-05-07T19:45:02.2840325Z 2025-05-07T19:45:02.2840333Z 2025-05-07T19:45:02.2840336Z 2025-05-07T19:45:02.2840340Z 2025-05-07T19:45:02.2840343Z 2025-05-07T19:45:02.2840346Z 2025-05-07T19:45:02.2840350Z 2025-05-07T19:45:02.2840353Z 2025-05-07T19:45:02.2840356Z 2025-05-07T19:45:02.2841320Z 2025-05-07T19:45:02.2842455Z libprotobuf-5.29.3 | 3.2 MB | ########## | 100%  2025-05-07T19:45:02.2842829Z 2025-05-07T19:45:02.2842834Z 2025-05-07T19:45:02.2842838Z 2025-05-07T19:45:02.2842842Z 2025-05-07T19:45:02.2842856Z 2025-05-07T19:45:02.2842859Z 2025-05-07T19:45:02.2842863Z 2025-05-07T19:45:02.2842866Z 2025-05-07T19:45:02.2842896Z 2025-05-07T19:45:02.2842900Z 2025-05-07T19:45:02.3069290Z libprotobuf-5.29.3 | 3.2 MB | ########## | 100%  2025-05-07T19:45:02.3069805Z 2025-05-07T19:45:02.3069810Z 2025-05-07T19:45:02.3069814Z 2025-05-07T19:45:02.3069817Z 2025-05-07T19:45:02.3069821Z 2025-05-07T19:45:02.3069825Z 2025-05-07T19:45:02.3069856Z 2025-05-07T19:45:02.3069859Z 2025-05-07T19:45:02.3069862Z 2025-05-07T19:45:02.3069865Z 2025-05-07T19:45:02.3069869Z 2025-05-07T19:45:02.3069872Z 2025-05-07T19:45:02.3069876Z 2025-05-07T19:45:02.3069888Z 2025-05-07T19:45:02.3188632Z libgfortran5-15.1.0 | 1.5 MB | 1 | 1%  2025-05-07T19:45:02.3189291Z 2025-05-07T19:45:02.3189299Z 2025-05-07T19:45:02.3189329Z 2025-05-07T19:45:02.3189337Z 2025-05-07T19:45:02.3189344Z 2025-05-07T19:45:02.3189352Z 2025-05-07T19:45:02.3189685Z 2025-05-07T19:45:02.3189689Z 2025-05-07T19:45:02.3189692Z 2025-05-07T19:45:02.3189696Z 2025-05-07T19:45:02.3189699Z 2025-05-07T19:45:02.3189703Z 2025-05-07T19:45:02.3189706Z 2025-05-07T19:45:02.3378741Z font-ttf-ubuntu-0.83 | 1.5 MB | ########## | 100%  2025-05-07T19:45:02.3379280Z 2025-05-07T19:45:02.3379285Z 2025-05-07T19:45:02.3379288Z 2025-05-07T19:45:02.3379292Z 2025-05-07T19:45:02.3379295Z 2025-05-07T19:45:02.3379299Z 2025-05-07T19:45:02.3379303Z 2025-05-07T19:45:02.3379306Z 2025-05-07T19:45:02.3379309Z 2025-05-07T19:45:02.3379313Z 2025-05-07T19:45:02.3379316Z 2025-05-07T19:45:02.3379320Z 2025-05-07T19:45:02.3379323Z 2025-05-07T19:45:02.3379326Z 2025-05-07T19:45:02.3379330Z 2025-05-07T19:45:02.3518061Z krb5-1.21.3 | 1.3 MB | 1 | 1%  2025-05-07T19:45:02.3587228Z openjdk-23.0.2 | 181.4 MB | #####9 | 59% 2025-05-07T19:45:02.3588677Z 2025-05-07T19:45:02.3588770Z 2025-05-07T19:45:02.3588789Z 2025-05-07T19:45:02.3588809Z 2025-05-07T19:45:02.3588828Z 2025-05-07T19:45:02.3588842Z 2025-05-07T19:45:02.3588858Z 2025-05-07T19:45:02.3588874Z 2025-05-07T19:45:02.3588894Z 2025-05-07T19:45:02.3588910Z 2025-05-07T19:45:02.3588930Z 2025-05-07T19:45:02.3588948Z 2025-05-07T19:45:02.3588965Z 2025-05-07T19:45:02.3588981Z 2025-05-07T19:45:02.3589004Z 2025-05-07T19:45:02.3589016Z 2025-05-07T19:45:02.3590298Z libabseil-20250127.1 | 1.3 MB | 1 | 1%  2025-05-07T19:45:02.3590927Z 2025-05-07T19:45:02.3590931Z 2025-05-07T19:45:02.3590935Z 2025-05-07T19:45:02.3590938Z 2025-05-07T19:45:02.3590942Z 2025-05-07T19:45:02.3590945Z 2025-05-07T19:45:02.3590948Z 2025-05-07T19:45:02.3590961Z 2025-05-07T19:45:02.3590964Z 2025-05-07T19:45:02.3590967Z 2025-05-07T19:45:02.3590971Z 2025-05-07T19:45:02.3590980Z 2025-05-07T19:45:02.3590983Z 2025-05-07T19:45:02.3591012Z 2025-05-07T19:45:02.4029438Z libgfortran5-15.1.0 | 1.5 MB | ########## | 100%  2025-05-07T19:45:02.4030034Z 2025-05-07T19:45:02.4030043Z 2025-05-07T19:45:02.4030046Z 2025-05-07T19:45:02.4030050Z 2025-05-07T19:45:02.4030053Z 2025-05-07T19:45:02.4030057Z 2025-05-07T19:45:02.4030085Z 2025-05-07T19:45:02.4030089Z 2025-05-07T19:45:02.4030093Z 2025-05-07T19:45:02.4030097Z 2025-05-07T19:45:02.4030100Z 2025-05-07T19:45:02.4030103Z 2025-05-07T19:45:02.4030107Z 2025-05-07T19:45:02.4030110Z 2025-05-07T19:45:02.4030114Z 2025-05-07T19:45:02.4089422Z krb5-1.21.3 | 1.3 MB | ########## | 100%  2025-05-07T19:45:02.4089783Z 2025-05-07T19:45:02.4089835Z 2025-05-07T19:45:02.4089839Z 2025-05-07T19:45:02.4090213Z 2025-05-07T19:45:02.4090222Z 2025-05-07T19:45:02.4090243Z 2025-05-07T19:45:02.4090248Z 2025-05-07T19:45:02.4090253Z 2025-05-07T19:45:02.4090276Z 2025-05-07T19:45:02.4090280Z 2025-05-07T19:45:02.4090285Z 2025-05-07T19:45:02.4090289Z 2025-05-07T19:45:02.4090305Z 2025-05-07T19:45:02.4090310Z 2025-05-07T19:45:02.4090315Z 2025-05-07T19:45:02.4090773Z 2025-05-07T19:45:02.4170638Z libabseil-20250127.1 | 1.3 MB | ########## | 100%  2025-05-07T19:45:02.4171039Z 2025-05-07T19:45:02.4202487Z bazel-7.5.0 | 47.4 MB | ########## | 100%  2025-05-07T19:45:02.4202787Z 2025-05-07T19:45:02.4202793Z 2025-05-07T19:45:02.4202796Z 2025-05-07T19:45:02.4202800Z 2025-05-07T19:45:02.4202803Z 2025-05-07T19:45:02.4202807Z 2025-05-07T19:45:02.4202810Z 2025-05-07T19:45:02.4202814Z 2025-05-07T19:45:02.4202818Z 2025-05-07T19:45:02.4202823Z 2025-05-07T19:45:02.4202828Z 2025-05-07T19:45:02.4202833Z 2025-05-07T19:45:02.4202838Z 2025-05-07T19:45:02.4202843Z 2025-05-07T19:45:02.4202846Z 2025-05-07T19:45:02.4202851Z 2025-05-07T19:45:02.4202854Z 2025-05-07T19:45:02.4453601Z cairo-1.18.4 | 955 KB | 1 | 2%  2025-05-07T19:45:02.4454203Z 2025-05-07T19:45:02.4454208Z 2025-05-07T19:45:02.4454212Z 2025-05-07T19:45:02.4454215Z 2025-05-07T19:45:02.4454219Z 2025-05-07T19:45:02.4454222Z 2025-05-07T19:45:02.4454226Z 2025-05-07T19:45:02.4454230Z 2025-05-07T19:45:02.4454233Z 2025-05-07T19:45:02.4454236Z 2025-05-07T19:45:02.4454239Z 2025-05-07T19:45:02.4454243Z 2025-05-07T19:45:02.4454246Z 2025-05-07T19:45:02.4454263Z 2025-05-07T19:45:02.4454267Z 2025-05-07T19:45:02.4454270Z 2025-05-07T19:45:02.4454273Z 2025-05-07T19:45:02.4484005Z cairo-1.18.4 | 955 KB | ########## | 100%  2025-05-07T19:45:02.4484344Z 2025-05-07T19:45:02.4484349Z 2025-05-07T19:45:02.4484352Z 2025-05-07T19:45:02.4484370Z 2025-05-07T19:45:02.4484374Z 2025-05-07T19:45:02.4484377Z 2025-05-07T19:45:02.4484380Z 2025-05-07T19:45:02.4484384Z 2025-05-07T19:45:02.4484387Z 2025-05-07T19:45:02.4484610Z 2025-05-07T19:45:02.4484614Z 2025-05-07T19:45:02.4484618Z 2025-05-07T19:45:02.4484621Z 2025-05-07T19:45:02.4484633Z 2025-05-07T19:45:02.4484636Z 2025-05-07T19:45:02.4484639Z 2025-05-07T19:45:02.4484643Z 2025-05-07T19:45:02.4484646Z 2025-05-07T19:45:02.4484649Z 2025-05-07T19:45:02.4602908Z ... (more hidden) ... 2025-05-07T19:45:02.4639955Z openjdk-23.0.2 | 181.4 MB | ######3 | 63% 2025-05-07T19:45:02.4640233Z 2025-05-07T19:45:02.4640252Z 2025-05-07T19:45:02.4640256Z 2025-05-07T19:45:02.4640259Z 2025-05-07T19:45:02.4640263Z 2025-05-07T19:45:02.4640266Z 2025-05-07T19:45:02.4640269Z 2025-05-07T19:45:02.4640273Z 2025-05-07T19:45:02.4640276Z 2025-05-07T19:45:02.4640279Z 2025-05-07T19:45:02.4640283Z 2025-05-07T19:45:02.4640286Z 2025-05-07T19:45:02.4640290Z 2025-05-07T19:45:02.4640293Z 2025-05-07T19:45:02.4640297Z 2025-05-07T19:45:02.4640300Z 2025-05-07T19:45:02.4640303Z 2025-05-07T19:45:02.4640307Z 2025-05-07T19:45:02.4750424Z pcre2-10.44 | 934 KB | 1 | 2%  2025-05-07T19:45:02.4751420Z 2025-05-07T19:45:02.4751471Z 2025-05-07T19:45:02.4751482Z 2025-05-07T19:45:02.4751492Z 2025-05-07T19:45:02.4751502Z 2025-05-07T19:45:02.4751512Z 2025-05-07T19:45:02.4751522Z 2025-05-07T19:45:02.4751532Z 2025-05-07T19:45:02.4751542Z 2025-05-07T19:45:02.4751553Z 2025-05-07T19:45:02.4751563Z 2025-05-07T19:45:02.4751573Z 2025-05-07T19:45:02.4751583Z 2025-05-07T19:45:02.4751609Z 2025-05-07T19:45:02.4751619Z 2025-05-07T19:45:02.4751629Z 2025-05-07T19:45:02.4751639Z 2025-05-07T19:45:02.4751648Z 2025-05-07T19:45:02.4751658Z 2025-05-07T19:45:02.5000509Z ... (more hidden) ... 2025-05-07T19:45:02.5000849Z 2025-05-07T19:45:02.5000866Z 2025-05-07T19:45:02.5000870Z 2025-05-07T19:45:02.5000874Z 2025-05-07T19:45:02.5000877Z 2025-05-07T19:45:02.5000880Z 2025-05-07T19:45:02.5000900Z 2025-05-07T19:45:02.5000904Z 2025-05-07T19:45:02.5000907Z 2025-05-07T19:45:02.5000911Z 2025-05-07T19:45:02.5000921Z 2025-05-07T19:45:02.5000924Z 2025-05-07T19:45:02.5000928Z 2025-05-07T19:45:02.5000931Z 2025-05-07T19:45:02.5000934Z 2025-05-07T19:45:02.5000938Z 2025-05-07T19:45:02.5000941Z 2025-05-07T19:45:02.5000944Z 2025-05-07T19:45:02.5662871Z pcre2-10.44 | 934 KB | ########## | 100%  2025-05-07T19:45:02.5663883Z 2025-05-07T19:45:02.5663897Z 2025-05-07T19:45:02.5663908Z 2025-05-07T19:45:02.5663919Z 2025-05-07T19:45:02.5737909Z icu-75.1 | 11.6 MB | ########## | 100%  2025-05-07T19:45:02.6441850Z openjdk-23.0.2 | 181.4 MB | ######6 | 67% 2025-05-07T19:45:02.6442165Z 2025-05-07T19:45:02.6442171Z 2025-05-07T19:45:02.6442177Z 2025-05-07T19:45:02.6442182Z 2025-05-07T19:45:02.6442187Z 2025-05-07T19:45:02.6924292Z libgrpc-1.71.0 | 7.6 MB | ########## | 100%  2025-05-07T19:45:02.7927033Z openjdk-23.0.2 | 181.4 MB | ####### | 70% 2025-05-07T19:45:02.8927517Z openjdk-23.0.2 | 181.4 MB | #######4 | 75% 2025-05-07T19:45:02.9930038Z openjdk-23.0.2 | 181.4 MB | #######8 | 79% 2025-05-07T19:45:03.0306449Z openjdk-23.0.2 | 181.4 MB | ########3 | 83% 2025-05-07T19:45:03.0306829Z 2025-05-07T19:45:03.0306837Z 2025-05-07T19:45:03.0306841Z 2025-05-07T19:45:03.0306846Z 2025-05-07T19:45:03.0306849Z 2025-05-07T19:45:03.0306857Z 2025-05-07T19:45:03.0306860Z 2025-05-07T19:45:03.0930625Z libopenblas-0.3.29 | 5.6 MB | ########## | 100%  2025-05-07T19:45:03.1932839Z openjdk-23.0.2 | 181.4 MB | ########7 | 88% 2025-05-07T19:45:03.2934953Z openjdk-23.0.2 | 181.4 MB | #########2 | 93% 2025-05-07T19:45:03.6555282Z openjdk-23.0.2 | 181.4 MB | #########7 | 98% 2025-05-07T19:45:03.6555658Z 2025-05-07T19:45:03.6555692Z 2025-05-07T19:45:03.6555698Z 2025-05-07T19:45:03.6555704Z 2025-05-07T19:45:03.6556002Z 2025-05-07T19:45:03.6556009Z 2025-05-07T19:45:03.8358121Z openblas-0.3.29 | 5.8 MB | ########## | 100%  2025-05-07T19:45:03.8358580Z 2025-05-07T19:45:03.8358585Z 2025-05-07T19:45:03.8358589Z 2025-05-07T19:45:03.8358616Z 2025-05-07T19:45:03.8358620Z 2025-05-07T19:45:03.8358623Z 2025-05-07T19:45:03.8358627Z 2025-05-07T19:45:03.8358632Z 2025-05-07T19:45:04.2182118Z libcups-2.3.3 | 4.3 MB | ########## | 100%  2025-05-07T19:45:04.2182515Z 2025-05-07T19:45:04.2182521Z 2025-05-07T19:45:04.2182527Z 2025-05-07T19:45:04.2182533Z 2025-05-07T19:45:04.2182538Z 2025-05-07T19:45:04.2182568Z 2025-05-07T19:45:04.2182572Z 2025-05-07T19:45:04.2182577Z 2025-05-07T19:45:04.2182580Z 2025-05-07T19:45:04.3694662Z libglib-2.84.0 | 3.8 MB | ########## | 100%  2025-05-07T19:45:04.3695036Z 2025-05-07T19:45:04.3695043Z 2025-05-07T19:45:04.3695049Z 2025-05-07T19:45:04.3695055Z 2025-05-07T19:45:04.3695062Z 2025-05-07T19:45:04.3695097Z 2025-05-07T19:45:04.3695101Z 2025-05-07T19:45:04.3695130Z 2025-05-07T19:45:04.3695134Z 2025-05-07T19:45:04.3695158Z 2025-05-07T19:45:04.3695162Z 2025-05-07T19:45:04.3695165Z 2025-05-07T19:45:04.3695475Z harfbuzz-11.0.0 | 1.6 MB | ########## | 100%  2025-05-07T19:45:04.3695789Z 2025-05-07T19:45:04.3695793Z 2025-05-07T19:45:04.3695796Z 2025-05-07T19:45:04.3695799Z 2025-05-07T19:45:04.3695839Z 2025-05-07T19:45:04.3695843Z 2025-05-07T19:45:04.3695846Z 2025-05-07T19:45:04.3695850Z 2025-05-07T19:45:04.3695853Z 2025-05-07T19:45:04.3695857Z 2025-05-07T19:45:04.3695860Z 2025-05-07T19:45:04.3695863Z 2025-05-07T19:45:04.8444928Z harfbuzz-11.0.0 | 1.6 MB | ########## | 100%  2025-05-07T19:45:04.8445342Z 2025-05-07T19:45:04.8445348Z 2025-05-07T19:45:04.8445353Z 2025-05-07T19:45:04.8445358Z 2025-05-07T19:45:04.8445362Z 2025-05-07T19:45:04.8445370Z 2025-05-07T19:45:04.8445375Z 2025-05-07T19:45:04.8445417Z 2025-05-07T19:45:04.8445422Z 2025-05-07T19:45:04.8445426Z 2025-05-07T19:45:04.8445429Z 2025-05-07T19:45:04.9624055Z tk-8.6.13 | 3.2 MB | ########## | 100%  2025-05-07T19:45:04.9624386Z 2025-05-07T19:45:04.9624416Z 2025-05-07T19:45:05.0491749Z python-3.13.2 | 31.7 MB | ########## | 100%  2025-05-07T19:45:05.0492141Z 2025-05-07T19:45:05.0492147Z 2025-05-07T19:45:05.0492151Z 2025-05-07T19:45:05.0492154Z 2025-05-07T19:45:05.0492158Z 2025-05-07T19:45:05.0492162Z 2025-05-07T19:45:05.0492165Z 2025-05-07T19:45:05.0492170Z 2025-05-07T19:45:05.0492196Z 2025-05-07T19:45:05.0492200Z 2025-05-07T19:45:05.0492204Z 2025-05-07T19:45:05.0492207Z 2025-05-07T19:45:05.0492226Z 2025-05-07T19:45:05.0492569Z font-ttf-ubuntu-0.83 | 1.5 MB | ########## | 100%  2025-05-07T19:45:05.0493019Z 2025-05-07T19:45:05.0493026Z 2025-05-07T19:45:05.0493032Z 2025-05-07T19:45:05.0493062Z 2025-05-07T19:45:05.0493107Z 2025-05-07T19:45:05.0493112Z 2025-05-07T19:45:05.0493117Z 2025-05-07T19:45:05.0493121Z 2025-05-07T19:45:05.0493381Z 2025-05-07T19:45:05.0493385Z 2025-05-07T19:45:05.0493388Z 2025-05-07T19:45:05.0493392Z 2025-05-07T19:45:05.0493395Z 2025-05-07T19:45:05.0493783Z font-ttf-ubuntu-0.83 | 1.5 MB | ########## | 100%  2025-05-07T19:45:05.0494147Z 2025-05-07T19:45:05.0494151Z 2025-05-07T19:45:05.0494157Z 2025-05-07T19:45:05.1539551Z cmake-4.0.2 | 19.4 MB | ########## | 100%  2025-05-07T19:45:05.1539920Z 2025-05-07T19:45:05.1539927Z 2025-05-07T19:45:05.1539931Z 2025-05-07T19:45:05.1539936Z 2025-05-07T19:45:05.1539939Z 2025-05-07T19:45:05.1539942Z 2025-05-07T19:45:05.1539972Z 2025-05-07T19:45:05.1539976Z 2025-05-07T19:45:05.1539979Z 2025-05-07T19:45:05.1539983Z 2025-05-07T19:45:05.1539987Z 2025-05-07T19:45:05.1539991Z 2025-05-07T19:45:05.1539995Z 2025-05-07T19:45:05.1539999Z 2025-05-07T19:45:05.1543760Z libgfortran5-15.1.0 | 1.5 MB | ########## | 100%  2025-05-07T19:45:05.1544185Z 2025-05-07T19:45:05.1544222Z 2025-05-07T19:45:05.1544225Z 2025-05-07T19:45:05.1544229Z 2025-05-07T19:45:05.1544232Z 2025-05-07T19:45:05.1544236Z 2025-05-07T19:45:05.1544239Z 2025-05-07T19:45:05.1544242Z 2025-05-07T19:45:05.1544246Z 2025-05-07T19:45:05.1544249Z 2025-05-07T19:45:05.1544253Z 2025-05-07T19:45:05.1544256Z 2025-05-07T19:45:05.1544261Z 2025-05-07T19:45:05.1544280Z 2025-05-07T19:45:05.2078912Z libgfortran5-15.1.0 | 1.5 MB | ########## | 100%  2025-05-07T19:45:05.2079374Z 2025-05-07T19:45:05.2079380Z 2025-05-07T19:45:05.2079385Z 2025-05-07T19:45:05.2079415Z 2025-05-07T19:45:05.2079419Z 2025-05-07T19:45:05.2079424Z 2025-05-07T19:45:05.2079428Z 2025-05-07T19:45:05.2079434Z 2025-05-07T19:45:05.2079438Z 2025-05-07T19:45:05.2079443Z 2025-05-07T19:45:05.2079448Z 2025-05-07T19:45:05.2079453Z 2025-05-07T19:45:05.2079456Z 2025-05-07T19:45:05.2079486Z 2025-05-07T19:45:05.2079489Z 2025-05-07T19:45:05.2079805Z krb5-1.21.3 | 1.3 MB | ########## | 100%  2025-05-07T19:45:05.2080165Z 2025-05-07T19:45:05.2080169Z 2025-05-07T19:45:05.2080172Z 2025-05-07T19:45:05.2080176Z 2025-05-07T19:45:05.2080179Z 2025-05-07T19:45:05.2080182Z 2025-05-07T19:45:05.2080185Z 2025-05-07T19:45:05.2080189Z 2025-05-07T19:45:05.2080192Z 2025-05-07T19:45:05.2080195Z 2025-05-07T19:45:05.2080199Z 2025-05-07T19:45:05.2080202Z 2025-05-07T19:45:05.2080206Z 2025-05-07T19:45:05.2080209Z 2025-05-07T19:45:05.2080212Z 2025-05-07T19:45:05.2406213Z krb5-1.21.3 | 1.3 MB | ########## | 100%  2025-05-07T19:45:05.2406571Z 2025-05-07T19:45:05.2406576Z 2025-05-07T19:45:05.2406579Z 2025-05-07T19:45:05.2406606Z 2025-05-07T19:45:05.2406609Z 2025-05-07T19:45:05.2406613Z 2025-05-07T19:45:05.2406616Z 2025-05-07T19:45:05.2406619Z 2025-05-07T19:45:05.2406636Z 2025-05-07T19:45:05.2406647Z 2025-05-07T19:45:05.3367347Z libprotobuf-5.29.3 | 3.2 MB | ########## | 100%  2025-05-07T19:45:05.3388639Z openjdk-23.0.2 | 181.4 MB | ########## | 100% 2025-05-07T19:45:05.3388925Z 2025-05-07T19:45:05.3388929Z 2025-05-07T19:45:05.3388933Z 2025-05-07T19:45:05.3388937Z 2025-05-07T19:45:05.3388942Z 2025-05-07T19:45:05.3388962Z 2025-05-07T19:45:05.3388967Z 2025-05-07T19:45:05.3388972Z 2025-05-07T19:45:05.3388977Z 2025-05-07T19:45:05.3388982Z 2025-05-07T19:45:05.3388987Z 2025-05-07T19:45:05.3388993Z 2025-05-07T19:45:05.3389000Z 2025-05-07T19:45:05.3389005Z 2025-05-07T19:45:05.3389010Z 2025-05-07T19:45:05.3389014Z 2025-05-07T19:45:05.3389018Z 2025-05-07T19:45:05.3390641Z cairo-1.18.4 | 955 KB | ########## | 100%  2025-05-07T19:45:05.3390977Z 2025-05-07T19:45:05.3390983Z 2025-05-07T19:45:05.3390989Z 2025-05-07T19:45:05.3390994Z 2025-05-07T19:45:05.3390999Z 2025-05-07T19:45:05.3391036Z 2025-05-07T19:45:05.3391039Z 2025-05-07T19:45:05.3391043Z 2025-05-07T19:45:05.3391241Z 2025-05-07T19:45:05.3391244Z 2025-05-07T19:45:05.3391248Z 2025-05-07T19:45:05.3391251Z 2025-05-07T19:45:05.3391255Z 2025-05-07T19:45:05.3391258Z 2025-05-07T19:45:05.3391261Z 2025-05-07T19:45:05.3391264Z 2025-05-07T19:45:05.3391268Z 2025-05-07T19:45:05.3868416Z cairo-1.18.4 | 955 KB | ########## | 100%  2025-05-07T19:45:05.3868778Z 2025-05-07T19:45:05.3868782Z 2025-05-07T19:45:05.3868786Z 2025-05-07T19:45:05.3868789Z 2025-05-07T19:45:05.3868808Z 2025-05-07T19:45:05.3868811Z 2025-05-07T19:45:05.3868815Z 2025-05-07T19:45:05.3868818Z 2025-05-07T19:45:05.3868822Z 2025-05-07T19:45:05.3868825Z 2025-05-07T19:45:05.3868829Z 2025-05-07T19:45:05.3868833Z 2025-05-07T19:45:05.3868836Z 2025-05-07T19:45:05.3868839Z 2025-05-07T19:45:05.3868842Z 2025-05-07T19:45:05.3868846Z 2025-05-07T19:45:05.3868849Z 2025-05-07T19:45:05.3869032Z 2025-05-07T19:45:05.3869037Z 2025-05-07T19:45:05.3871703Z ... (more hidden) ... 2025-05-07T19:45:05.3872025Z 2025-05-07T19:45:05.3872043Z 2025-05-07T19:45:05.3872046Z 2025-05-07T19:45:05.3872050Z 2025-05-07T19:45:05.3872053Z 2025-05-07T19:45:05.3872056Z 2025-05-07T19:45:05.3872060Z 2025-05-07T19:45:05.3872063Z 2025-05-07T19:45:05.3872066Z 2025-05-07T19:45:05.3872070Z 2025-05-07T19:45:05.3872073Z 2025-05-07T19:45:05.3872076Z 2025-05-07T19:45:05.3872080Z 2025-05-07T19:45:05.3872083Z 2025-05-07T19:45:05.3872087Z 2025-05-07T19:45:05.3872090Z 2025-05-07T19:45:05.3872108Z 2025-05-07T19:45:05.3872111Z 2025-05-07T19:45:05.3872114Z 2025-05-07T19:45:05.6115712Z ... (more hidden) ... 2025-05-07T19:45:05.6116040Z 2025-05-07T19:45:05.6116198Z 2025-05-07T19:45:05.6116202Z 2025-05-07T19:45:05.6116205Z 2025-05-07T19:45:05.6116243Z 2025-05-07T19:45:05.6116246Z 2025-05-07T19:45:05.6116262Z 2025-05-07T19:45:05.6116266Z 2025-05-07T19:45:05.6116269Z 2025-05-07T19:45:05.6116273Z 2025-05-07T19:45:05.6116286Z 2025-05-07T19:45:05.6116323Z 2025-05-07T19:45:05.6116328Z 2025-05-07T19:45:05.6116341Z 2025-05-07T19:45:05.6116345Z 2025-05-07T19:45:05.6116443Z 2025-05-07T19:45:05.6117567Z libabseil-20250127.1 | 1.3 MB | ########## | 100%  2025-05-07T19:45:05.6118020Z 2025-05-07T19:45:05.6118027Z 2025-05-07T19:45:05.6118032Z 2025-05-07T19:45:05.6118036Z 2025-05-07T19:45:05.6118041Z 2025-05-07T19:45:05.6118044Z 2025-05-07T19:45:05.6118049Z 2025-05-07T19:45:05.6118054Z 2025-05-07T19:45:05.6118059Z 2025-05-07T19:45:05.6118064Z 2025-05-07T19:45:05.6118068Z 2025-05-07T19:45:05.6118073Z 2025-05-07T19:45:05.6118076Z 2025-05-07T19:45:05.6118082Z 2025-05-07T19:45:05.6118085Z 2025-05-07T19:45:05.6118103Z 2025-05-07T19:45:05.8050177Z libabseil-20250127.1 | 1.3 MB | ########## | 100%  2025-05-07T19:45:05.8050656Z 2025-05-07T19:45:05.8050661Z 2025-05-07T19:45:05.8050666Z 2025-05-07T19:45:05.8050691Z 2025-05-07T19:45:05.8050695Z 2025-05-07T19:45:05.8050699Z 2025-05-07T19:45:05.8050702Z 2025-05-07T19:45:05.8050706Z 2025-05-07T19:45:05.8050709Z 2025-05-07T19:45:05.8050713Z 2025-05-07T19:45:05.8050738Z 2025-05-07T19:45:05.8050742Z 2025-05-07T19:45:05.8050745Z 2025-05-07T19:45:05.8050749Z 2025-05-07T19:45:05.8050752Z 2025-05-07T19:45:05.8050755Z 2025-05-07T19:45:05.8050758Z 2025-05-07T19:45:05.8050762Z 2025-05-07T19:45:05.8051115Z pcre2-10.44 | 934 KB | ########## | 100%  2025-05-07T19:45:05.8051443Z 2025-05-07T19:45:05.8051467Z 2025-05-07T19:45:05.8051471Z 2025-05-07T19:45:05.8051474Z 2025-05-07T19:45:05.8051478Z 2025-05-07T19:45:05.8051481Z 2025-05-07T19:45:05.8051484Z 2025-05-07T19:45:05.8051488Z 2025-05-07T19:45:05.8051491Z 2025-05-07T19:45:05.8051495Z 2025-05-07T19:45:05.8051498Z 2025-05-07T19:45:05.8051509Z 2025-05-07T19:45:05.8051512Z 2025-05-07T19:45:05.8051516Z 2025-05-07T19:45:05.8051519Z 2025-05-07T19:45:05.8051753Z 2025-05-07T19:45:05.8051757Z 2025-05-07T19:45:05.8051760Z 2025-05-07T19:45:06.5861486Z pcre2-10.44 | 934 KB | ########## | 100%  2025-05-07T19:45:06.5861857Z 2025-05-07T19:45:08.4250790Z bazel-7.5.0 | 47.4 MB | ########## | 100%  2025-05-07T19:45:08.4259814Z openjdk-23.0.2 | 181.4 MB | ########## | 100% 2025-05-07T19:45:08.4260580Z 2025-05-07T19:45:08.4260595Z 2025-05-07T19:45:08.4260607Z 2025-05-07T19:45:08.4260669Z 2025-05-07T19:45:08.4260680Z 2025-05-07T19:45:08.4260692Z 2025-05-07T19:45:08.4260703Z 2025-05-07T19:45:08.4260714Z 2025-05-07T19:45:08.4260752Z 2025-05-07T19:45:08.4260764Z 2025-05-07T19:45:08.4260779Z 2025-05-07T19:45:08.4260789Z 2025-05-07T19:45:08.4260800Z 2025-05-07T19:45:08.4260810Z 2025-05-07T19:45:08.4260821Z 2025-05-07T19:45:08.4260832Z 2025-05-07T19:45:08.4261380Z 2025-05-07T19:45:08.4261395Z 2025-05-07T19:45:08.4261409Z 2025-05-07T19:45:08.4261650Z 2025-05-07T19:45:08.4262746Z  2025-05-07T19:45:08.4264122Z 2025-05-07T19:45:08.4265204Z 2025-05-07T19:45:08.4265913Z  2025-05-07T19:45:08.4266136Z 2025-05-07T19:45:08.4266140Z 2025-05-07T19:45:08.4266339Z  2025-05-07T19:45:08.4266561Z 2025-05-07T19:45:08.4266564Z 2025-05-07T19:45:08.4266568Z 2025-05-07T19:45:08.4266759Z  2025-05-07T19:45:08.4267022Z 2025-05-07T19:45:08.4267026Z 2025-05-07T19:45:08.4267030Z 2025-05-07T19:45:08.4267034Z 2025-05-07T19:45:08.4267258Z  2025-05-07T19:45:08.4267500Z 2025-05-07T19:45:08.4267504Z 2025-05-07T19:45:08.4267515Z 2025-05-07T19:45:08.4267519Z 2025-05-07T19:45:08.4267522Z 2025-05-07T19:45:08.4267710Z  2025-05-07T19:45:08.4267949Z 2025-05-07T19:45:08.4267971Z 2025-05-07T19:45:08.4267974Z 2025-05-07T19:45:08.4267978Z 2025-05-07T19:45:08.4267981Z 2025-05-07T19:45:08.4267984Z 2025-05-07T19:45:08.4268177Z  2025-05-07T19:45:08.4268405Z 2025-05-07T19:45:08.4268408Z 2025-05-07T19:45:08.4268412Z 2025-05-07T19:45:08.4268415Z 2025-05-07T19:45:08.4268418Z 2025-05-07T19:45:08.4268439Z 2025-05-07T19:45:08.4268442Z 2025-05-07T19:45:08.4268631Z  2025-05-07T19:45:08.4268865Z 2025-05-07T19:45:08.4268868Z 2025-05-07T19:45:08.4268872Z 2025-05-07T19:45:08.4268875Z 2025-05-07T19:45:08.4268879Z 2025-05-07T19:45:08.4268884Z 2025-05-07T19:45:08.4268889Z 2025-05-07T19:45:08.4268894Z 2025-05-07T19:45:08.4269227Z  2025-05-07T19:45:08.4269498Z 2025-05-07T19:45:08.4269501Z 2025-05-07T19:45:08.4269505Z 2025-05-07T19:45:08.4269508Z 2025-05-07T19:45:08.4269512Z 2025-05-07T19:45:08.4269515Z 2025-05-07T19:45:08.4269518Z 2025-05-07T19:45:08.4269522Z 2025-05-07T19:45:08.4269525Z 2025-05-07T19:45:08.4269753Z  2025-05-07T19:45:08.4269999Z 2025-05-07T19:45:08.4270002Z 2025-05-07T19:45:08.4270007Z 2025-05-07T19:45:08.4270010Z 2025-05-07T19:45:08.4270013Z 2025-05-07T19:45:08.4270017Z 2025-05-07T19:45:08.4270020Z 2025-05-07T19:45:08.4270023Z 2025-05-07T19:45:08.4270026Z 2025-05-07T19:45:08.4270030Z 2025-05-07T19:45:08.4270247Z  2025-05-07T19:45:08.4270492Z 2025-05-07T19:45:08.4270495Z 2025-05-07T19:45:08.4270499Z 2025-05-07T19:45:08.4270502Z 2025-05-07T19:45:08.4270510Z 2025-05-07T19:45:08.4270513Z 2025-05-07T19:45:08.4270517Z 2025-05-07T19:45:08.4270520Z 2025-05-07T19:45:08.4270693Z 2025-05-07T19:45:08.4270696Z 2025-05-07T19:45:08.4270700Z 2025-05-07T19:45:08.4270950Z  2025-05-07T19:45:08.4271193Z 2025-05-07T19:45:08.4271197Z 2025-05-07T19:45:08.4271201Z 2025-05-07T19:45:08.4271204Z 2025-05-07T19:45:08.4271208Z 2025-05-07T19:45:08.4271211Z 2025-05-07T19:45:08.4271214Z 2025-05-07T19:45:08.4271220Z 2025-05-07T19:45:08.4271223Z 2025-05-07T19:45:08.4271227Z 2025-05-07T19:45:08.4271247Z 2025-05-07T19:45:08.4271250Z 2025-05-07T19:45:08.4271511Z  2025-05-07T19:45:08.4271755Z 2025-05-07T19:45:08.4271775Z 2025-05-07T19:45:08.4271779Z 2025-05-07T19:45:08.4271782Z 2025-05-07T19:45:08.4271786Z 2025-05-07T19:45:08.4271789Z 2025-05-07T19:45:08.4271792Z 2025-05-07T19:45:08.4271869Z 2025-05-07T19:45:08.4271874Z 2025-05-07T19:45:08.4271878Z 2025-05-07T19:45:08.4271881Z 2025-05-07T19:45:08.4271889Z 2025-05-07T19:45:08.4271892Z 2025-05-07T19:45:08.4272110Z  2025-05-07T19:45:08.4272373Z 2025-05-07T19:45:08.4272377Z 2025-05-07T19:45:08.4272381Z 2025-05-07T19:45:08.4272384Z 2025-05-07T19:45:08.4272387Z 2025-05-07T19:45:08.4272391Z 2025-05-07T19:45:08.4272394Z 2025-05-07T19:45:08.4272397Z 2025-05-07T19:45:08.4272401Z 2025-05-07T19:45:08.4272404Z 2025-05-07T19:45:08.4272407Z 2025-05-07T19:45:08.4272411Z 2025-05-07T19:45:08.4272414Z 2025-05-07T19:45:08.4272417Z 2025-05-07T19:45:08.4272637Z  2025-05-07T19:45:08.4272905Z 2025-05-07T19:45:08.4272909Z 2025-05-07T19:45:08.4272913Z 2025-05-07T19:45:08.4272916Z 2025-05-07T19:45:08.4272920Z 2025-05-07T19:45:08.4272924Z 2025-05-07T19:45:08.4272931Z 2025-05-07T19:45:08.4272934Z 2025-05-07T19:45:08.4272937Z 2025-05-07T19:45:08.4272941Z 2025-05-07T19:45:08.4272947Z 2025-05-07T19:45:08.4272950Z 2025-05-07T19:45:08.4272953Z 2025-05-07T19:45:08.4272957Z 2025-05-07T19:45:08.4272960Z 2025-05-07T19:45:08.4273200Z  2025-05-07T19:45:08.4273447Z 2025-05-07T19:45:08.4273474Z 2025-05-07T19:45:08.4273477Z 2025-05-07T19:45:08.4273481Z 2025-05-07T19:45:08.4273484Z 2025-05-07T19:45:08.4273487Z 2025-05-07T19:45:08.4273491Z 2025-05-07T19:45:08.4273494Z 2025-05-07T19:45:08.4273497Z 2025-05-07T19:45:08.4273500Z 2025-05-07T19:45:08.4273504Z 2025-05-07T19:45:08.4273507Z 2025-05-07T19:45:08.4273510Z 2025-05-07T19:45:08.4273513Z 2025-05-07T19:45:08.4273517Z 2025-05-07T19:45:08.4273520Z 2025-05-07T19:45:08.4273751Z  2025-05-07T19:45:08.4274027Z 2025-05-07T19:45:08.4274035Z 2025-05-07T19:45:08.4274039Z 2025-05-07T19:45:08.4274042Z 2025-05-07T19:45:08.4274049Z 2025-05-07T19:45:08.4274052Z 2025-05-07T19:45:08.4274056Z 2025-05-07T19:45:08.4274059Z 2025-05-07T19:45:08.4274062Z 2025-05-07T19:45:08.4274066Z 2025-05-07T19:45:08.4274069Z 2025-05-07T19:45:08.4274072Z 2025-05-07T19:45:08.4274076Z 2025-05-07T19:45:08.4274079Z 2025-05-07T19:45:08.4274082Z 2025-05-07T19:45:08.4274086Z 2025-05-07T19:45:08.4274089Z 2025-05-07T19:45:08.4274344Z  2025-05-07T19:45:08.4274606Z 2025-05-07T19:45:08.4274609Z 2025-05-07T19:45:08.4274613Z 2025-05-07T19:45:08.4274616Z 2025-05-07T19:45:08.4274619Z 2025-05-07T19:45:08.4274623Z 2025-05-07T19:45:08.4274626Z 2025-05-07T19:45:08.4274629Z 2025-05-07T19:45:08.4274633Z 2025-05-07T19:45:08.4274636Z 2025-05-07T19:45:08.4274762Z 2025-05-07T19:45:08.4274784Z 2025-05-07T19:45:08.4274787Z 2025-05-07T19:45:08.4274794Z 2025-05-07T19:45:08.4274798Z 2025-05-07T19:45:08.4274801Z 2025-05-07T19:45:08.4274804Z 2025-05-07T19:45:08.4274901Z 2025-05-07T19:45:08.4275142Z  2025-05-07T19:45:08.4275400Z 2025-05-07T19:45:08.4275403Z 2025-05-07T19:45:08.4275529Z  2025-05-07T19:45:08.4275639Z 2025-05-07T19:45:08.4275642Z 2025-05-07T19:45:08.4275789Z  2025-05-07T19:45:08.4275904Z 2025-05-07T19:45:08.4275908Z 2025-05-07T19:45:08.4275911Z 2025-05-07T19:45:08.4276019Z  2025-05-07T19:45:08.4276137Z 2025-05-07T19:45:08.4276159Z 2025-05-07T19:45:08.4276163Z 2025-05-07T19:45:08.4276167Z 2025-05-07T19:45:08.4276278Z  2025-05-07T19:45:08.4276403Z 2025-05-07T19:45:08.4276407Z 2025-05-07T19:45:08.4276410Z 2025-05-07T19:45:08.4276413Z 2025-05-07T19:45:08.4276416Z 2025-05-07T19:45:08.4276545Z  2025-05-07T19:45:08.4276675Z 2025-05-07T19:45:08.4276679Z 2025-05-07T19:45:08.4276743Z 2025-05-07T19:45:08.4276747Z 2025-05-07T19:45:08.4276750Z 2025-05-07T19:45:08.4276753Z 2025-05-07T19:45:08.4276872Z  2025-05-07T19:45:08.4277027Z 2025-05-07T19:45:08.4277031Z 2025-05-07T19:45:08.4277034Z 2025-05-07T19:45:08.4277037Z 2025-05-07T19:45:08.4277041Z 2025-05-07T19:45:08.4277044Z 2025-05-07T19:45:08.4277047Z 2025-05-07T19:45:08.4277164Z  2025-05-07T19:45:08.4277329Z 2025-05-07T19:45:08.4277333Z 2025-05-07T19:45:08.4277336Z 2025-05-07T19:45:08.4277339Z 2025-05-07T19:45:08.4277343Z 2025-05-07T19:45:08.4277346Z 2025-05-07T19:45:08.4277349Z 2025-05-07T19:45:08.4277352Z 2025-05-07T19:45:08.4277473Z  2025-05-07T19:45:08.4277641Z 2025-05-07T19:45:08.4277645Z 2025-05-07T19:45:08.4277649Z 2025-05-07T19:45:08.4277669Z 2025-05-07T19:45:08.4277672Z 2025-05-07T19:45:08.4277675Z 2025-05-07T19:45:08.4277678Z 2025-05-07T19:45:08.4277682Z 2025-05-07T19:45:08.4277685Z 2025-05-07T19:45:08.4277840Z  2025-05-07T19:45:08.4278190Z 2025-05-07T19:45:08.4278211Z 2025-05-07T19:45:08.4278214Z 2025-05-07T19:45:08.4278221Z 2025-05-07T19:45:08.4278224Z 2025-05-07T19:45:08.4278227Z 2025-05-07T19:45:08.4278230Z 2025-05-07T19:45:08.4278234Z 2025-05-07T19:45:08.4278237Z 2025-05-07T19:45:08.4278240Z 2025-05-07T19:45:08.4278458Z  2025-05-07T19:45:08.4278633Z 2025-05-07T19:45:08.4278638Z 2025-05-07T19:45:08.4278658Z 2025-05-07T19:45:08.4278661Z 2025-05-07T19:45:08.4278665Z 2025-05-07T19:45:08.4278668Z 2025-05-07T19:45:08.4278672Z 2025-05-07T19:45:08.4278675Z 2025-05-07T19:45:08.4278678Z 2025-05-07T19:45:08.4278681Z 2025-05-07T19:45:08.4278685Z 2025-05-07T19:45:08.4278824Z  2025-05-07T19:45:08.4279012Z 2025-05-07T19:45:08.4279034Z 2025-05-07T19:45:08.4279037Z 2025-05-07T19:45:08.4279040Z 2025-05-07T19:45:08.4279044Z 2025-05-07T19:45:08.4279047Z 2025-05-07T19:45:08.4279050Z 2025-05-07T19:45:08.4279054Z 2025-05-07T19:45:08.4279061Z 2025-05-07T19:45:08.4279065Z 2025-05-07T19:45:08.4279069Z 2025-05-07T19:45:08.4279072Z 2025-05-07T19:45:08.4279215Z  2025-05-07T19:45:08.4279427Z 2025-05-07T19:45:08.4279431Z 2025-05-07T19:45:08.4279434Z 2025-05-07T19:45:08.4279438Z 2025-05-07T19:45:08.4279441Z 2025-05-07T19:45:08.4279444Z 2025-05-07T19:45:08.4279448Z 2025-05-07T19:45:08.4279451Z 2025-05-07T19:45:08.4279455Z 2025-05-07T19:45:08.4279458Z 2025-05-07T19:45:08.4279461Z 2025-05-07T19:45:08.4279464Z 2025-05-07T19:45:08.4279468Z 2025-05-07T19:45:08.4279659Z  2025-05-07T19:45:08.4279879Z 2025-05-07T19:45:08.4279882Z 2025-05-07T19:45:08.4279885Z 2025-05-07T19:45:08.4279889Z 2025-05-07T19:45:08.4279893Z 2025-05-07T19:45:08.4279896Z 2025-05-07T19:45:08.4279899Z 2025-05-07T19:45:08.4279903Z 2025-05-07T19:45:08.4279906Z 2025-05-07T19:45:08.4279909Z 2025-05-07T19:45:08.4279912Z 2025-05-07T19:45:08.4279916Z 2025-05-07T19:45:08.4279919Z 2025-05-07T19:45:08.4279926Z 2025-05-07T19:45:08.4280096Z  2025-05-07T19:45:08.4280306Z 2025-05-07T19:45:08.4280379Z 2025-05-07T19:45:08.4280382Z 2025-05-07T19:45:08.4280386Z 2025-05-07T19:45:08.4280389Z 2025-05-07T19:45:08.4280392Z 2025-05-07T19:45:08.4280396Z 2025-05-07T19:45:08.4280399Z 2025-05-07T19:45:08.4280402Z 2025-05-07T19:45:08.4280405Z 2025-05-07T19:45:08.4280409Z 2025-05-07T19:45:08.4280413Z 2025-05-07T19:45:08.4280417Z 2025-05-07T19:45:08.4280420Z 2025-05-07T19:45:08.4280442Z 2025-05-07T19:45:08.4280600Z  2025-05-07T19:45:08.4280822Z 2025-05-07T19:45:08.4280825Z 2025-05-07T19:45:08.4280828Z 2025-05-07T19:45:08.4280832Z 2025-05-07T19:45:08.4280835Z 2025-05-07T19:45:08.4280839Z 2025-05-07T19:45:08.4280842Z 2025-05-07T19:45:08.4280845Z 2025-05-07T19:45:08.4280849Z 2025-05-07T19:45:08.4280852Z 2025-05-07T19:45:08.4280873Z 2025-05-07T19:45:08.4280877Z 2025-05-07T19:45:08.4280880Z 2025-05-07T19:45:08.4282270Z 2025-05-07T19:45:08.4282276Z 2025-05-07T19:45:08.4282279Z 2025-05-07T19:45:08.4282485Z  2025-05-07T19:45:08.4282711Z 2025-05-07T19:45:08.4282769Z 2025-05-07T19:45:08.4282772Z 2025-05-07T19:45:08.4282775Z 2025-05-07T19:45:08.4282779Z 2025-05-07T19:45:08.4282782Z 2025-05-07T19:45:08.4282785Z 2025-05-07T19:45:08.4282789Z 2025-05-07T19:45:08.4282792Z 2025-05-07T19:45:08.4282795Z 2025-05-07T19:45:08.4282798Z 2025-05-07T19:45:08.4282802Z 2025-05-07T19:45:08.4282805Z 2025-05-07T19:45:08.4282808Z 2025-05-07T19:45:08.4282811Z 2025-05-07T19:45:08.4282815Z 2025-05-07T19:45:08.4282818Z 2025-05-07T19:45:08.4282991Z  2025-05-07T19:45:08.4283240Z 2025-05-07T19:45:08.4283244Z 2025-05-07T19:45:08.4283247Z 2025-05-07T19:45:08.4283251Z 2025-05-07T19:45:08.4283255Z 2025-05-07T19:45:08.4283258Z 2025-05-07T19:45:08.4283262Z 2025-05-07T19:45:08.4283265Z 2025-05-07T19:45:08.4283269Z 2025-05-07T19:45:08.4283275Z 2025-05-07T19:45:08.4283279Z 2025-05-07T19:45:08.4283282Z 2025-05-07T19:45:08.4283286Z 2025-05-07T19:45:08.4283293Z 2025-05-07T19:45:08.4283296Z 2025-05-07T19:45:08.4283300Z 2025-05-07T19:45:08.4283304Z 2025-05-07T19:45:08.4283307Z 2025-05-07T19:45:08.4283499Z  2025-05-07T19:45:08.4283727Z 2025-05-07T19:45:08.4283731Z 2025-05-07T19:45:08.4283837Z  2025-05-07T19:45:08.4283965Z 2025-05-07T19:45:08.4283969Z 2025-05-07T19:45:08.4284072Z  2025-05-07T19:45:08.4284186Z 2025-05-07T19:45:08.4284190Z 2025-05-07T19:45:08.4284193Z 2025-05-07T19:45:08.4284324Z  2025-05-07T19:45:08.4284442Z 2025-05-07T19:45:08.4284446Z 2025-05-07T19:45:08.4284449Z 2025-05-07T19:45:08.4284453Z 2025-05-07T19:45:08.4284590Z  2025-05-07T19:45:08.4284734Z 2025-05-07T19:45:08.4284738Z 2025-05-07T19:45:08.4284741Z 2025-05-07T19:45:08.4284744Z 2025-05-07T19:45:08.4284748Z 2025-05-07T19:45:08.4284858Z  2025-05-07T19:45:08.4285015Z 2025-05-07T19:45:08.4285018Z 2025-05-07T19:45:08.4285022Z 2025-05-07T19:45:08.4285025Z 2025-05-07T19:45:08.4285033Z 2025-05-07T19:45:08.4285036Z 2025-05-07T19:45:08.4285152Z  2025-05-07T19:45:08.4285291Z 2025-05-07T19:45:08.4285294Z 2025-05-07T19:45:08.4285297Z 2025-05-07T19:45:08.4285301Z 2025-05-07T19:45:08.4285304Z 2025-05-07T19:45:08.4285326Z 2025-05-07T19:45:08.4285329Z 2025-05-07T19:45:08.4285449Z  2025-05-07T19:45:08.4285600Z 2025-05-07T19:45:08.4285604Z 2025-05-07T19:45:08.4285608Z 2025-05-07T19:45:08.4285611Z 2025-05-07T19:45:08.4285614Z 2025-05-07T19:45:08.4285618Z 2025-05-07T19:45:08.4285621Z 2025-05-07T19:45:08.4285624Z 2025-05-07T19:45:08.4285825Z  2025-05-07T19:45:08.4285986Z 2025-05-07T19:45:08.4285989Z 2025-05-07T19:45:08.4285993Z 2025-05-07T19:45:08.4285996Z 2025-05-07T19:45:08.4285999Z 2025-05-07T19:45:08.4286003Z 2025-05-07T19:45:08.4286006Z 2025-05-07T19:45:08.4286009Z 2025-05-07T19:45:08.4286016Z 2025-05-07T19:45:08.4286160Z  2025-05-07T19:45:08.4286327Z 2025-05-07T19:45:08.4286394Z 2025-05-07T19:45:08.4286398Z 2025-05-07T19:45:08.4286401Z 2025-05-07T19:45:08.4286405Z 2025-05-07T19:45:08.4286408Z 2025-05-07T19:45:08.4286412Z 2025-05-07T19:45:08.4286415Z 2025-05-07T19:45:08.4286420Z 2025-05-07T19:45:08.4286424Z 2025-05-07T19:45:08.4286578Z  2025-05-07T19:45:08.4286751Z 2025-05-07T19:45:08.4286756Z 2025-05-07T19:45:08.4286759Z 2025-05-07T19:45:08.4286762Z 2025-05-07T19:45:08.4286766Z 2025-05-07T19:45:08.4286786Z 2025-05-07T19:45:08.4286790Z 2025-05-07T19:45:08.4286793Z 2025-05-07T19:45:08.4286796Z 2025-05-07T19:45:08.4286817Z 2025-05-07T19:45:08.4286820Z 2025-05-07T19:45:08.4286957Z  2025-05-07T19:45:08.4287153Z 2025-05-07T19:45:08.4287156Z 2025-05-07T19:45:08.4287160Z 2025-05-07T19:45:08.4287163Z 2025-05-07T19:45:08.4287166Z 2025-05-07T19:45:08.4287170Z 2025-05-07T19:45:08.4287239Z 2025-05-07T19:45:08.4287243Z 2025-05-07T19:45:08.4287246Z 2025-05-07T19:45:08.4287268Z 2025-05-07T19:45:08.4287274Z 2025-05-07T19:45:08.4287278Z 2025-05-07T19:45:08.4287420Z  2025-05-07T19:45:08.4287617Z 2025-05-07T19:45:08.4287621Z 2025-05-07T19:45:08.4287624Z 2025-05-07T19:45:08.4287629Z 2025-05-07T19:45:08.4287633Z 2025-05-07T19:45:08.4287636Z 2025-05-07T19:45:08.4287640Z 2025-05-07T19:45:08.4287664Z 2025-05-07T19:45:08.4287667Z 2025-05-07T19:45:08.4287670Z 2025-05-07T19:45:08.4287674Z 2025-05-07T19:45:08.4287677Z 2025-05-07T19:45:08.4287680Z 2025-05-07T19:45:08.4287846Z  2025-05-07T19:45:08.4288052Z 2025-05-07T19:45:08.4288056Z 2025-05-07T19:45:08.4288059Z 2025-05-07T19:45:08.4288081Z 2025-05-07T19:45:08.4288084Z 2025-05-07T19:45:08.4288087Z 2025-05-07T19:45:08.4288090Z 2025-05-07T19:45:08.4288094Z 2025-05-07T19:45:08.4288097Z 2025-05-07T19:45:08.4288101Z 2025-05-07T19:45:08.4288104Z 2025-05-07T19:45:08.4288111Z 2025-05-07T19:45:08.4288115Z 2025-05-07T19:45:08.4288118Z 2025-05-07T19:45:08.4288267Z  2025-05-07T19:45:08.4288505Z 2025-05-07T19:45:08.4288509Z 2025-05-07T19:45:08.4288512Z 2025-05-07T19:45:08.4288516Z 2025-05-07T19:45:08.4288519Z 2025-05-07T19:45:08.4288523Z 2025-05-07T19:45:08.4288526Z 2025-05-07T19:45:08.4288529Z 2025-05-07T19:45:08.4288532Z 2025-05-07T19:45:08.4288536Z 2025-05-07T19:45:08.4288539Z 2025-05-07T19:45:08.4288542Z 2025-05-07T19:45:08.4288546Z 2025-05-07T19:45:08.4288549Z 2025-05-07T19:45:08.4288552Z 2025-05-07T19:45:08.4288774Z  2025-05-07T19:45:08.4288997Z 2025-05-07T19:45:08.4289000Z 2025-05-07T19:45:08.4289004Z 2025-05-07T19:45:08.4289007Z 2025-05-07T19:45:08.4289011Z 2025-05-07T19:45:08.4289015Z 2025-05-07T19:45:08.4289018Z 2025-05-07T19:45:08.4289022Z 2025-05-07T19:45:08.4289025Z 2025-05-07T19:45:08.4289028Z 2025-05-07T19:45:08.4289032Z 2025-05-07T19:45:08.4289038Z 2025-05-07T19:45:08.4289042Z 2025-05-07T19:45:08.4289045Z 2025-05-07T19:45:08.4289048Z 2025-05-07T19:45:08.4289072Z 2025-05-07T19:45:08.4289232Z  2025-05-07T19:45:08.4289480Z 2025-05-07T19:45:08.4289484Z 2025-05-07T19:45:08.4289487Z 2025-05-07T19:45:08.4289490Z 2025-05-07T19:45:08.4289494Z 2025-05-07T19:45:08.4289497Z 2025-05-07T19:45:08.4289500Z 2025-05-07T19:45:08.4289504Z 2025-05-07T19:45:08.4289507Z 2025-05-07T19:45:08.4289511Z 2025-05-07T19:45:08.4289514Z 2025-05-07T19:45:08.4289535Z 2025-05-07T19:45:08.4289539Z 2025-05-07T19:45:08.4289542Z 2025-05-07T19:45:08.4289545Z 2025-05-07T19:45:08.4289548Z 2025-05-07T19:45:08.4289552Z 2025-05-07T19:45:08.4289721Z  2025-05-07T19:45:08.4289947Z 2025-05-07T19:45:08.4289951Z 2025-05-07T19:45:08.4289955Z 2025-05-07T19:45:08.4289958Z 2025-05-07T19:45:08.4289981Z 2025-05-07T19:45:08.4289985Z 2025-05-07T19:45:08.4289988Z 2025-05-07T19:45:08.4289995Z 2025-05-07T19:45:08.4290015Z 2025-05-07T19:45:08.4290019Z 2025-05-07T19:45:08.4290022Z 2025-05-07T19:45:08.4290090Z 2025-05-07T19:45:08.4290093Z 2025-05-07T19:45:08.4290096Z 2025-05-07T19:45:08.4290099Z 2025-05-07T19:45:08.4290103Z 2025-05-07T19:45:08.4290106Z 2025-05-07T19:45:08.4290109Z 2025-05-07T19:45:08.4290289Z  2025-05-07T19:45:08.4290544Z 2025-05-07T19:45:08.4290548Z 2025-05-07T19:45:08.4290652Z  2025-05-07T19:45:08.4290771Z 2025-05-07T19:45:08.4290776Z 2025-05-07T19:45:08.4290921Z  2025-05-07T19:45:08.4291047Z 2025-05-07T19:45:08.4291050Z 2025-05-07T19:45:08.4291054Z 2025-05-07T19:45:08.4291163Z  2025-05-07T19:45:08.4291299Z 2025-05-07T19:45:08.4291304Z 2025-05-07T19:45:08.4291307Z 2025-05-07T19:45:08.4291311Z 2025-05-07T19:45:08.4291441Z  2025-05-07T19:45:08.4291568Z 2025-05-07T19:45:08.4291572Z 2025-05-07T19:45:08.4291595Z 2025-05-07T19:45:08.4291598Z 2025-05-07T19:45:08.4291681Z 2025-05-07T19:45:08.4291799Z  2025-05-07T19:45:08.4291937Z 2025-05-07T19:45:08.4291940Z 2025-05-07T19:45:08.4291947Z 2025-05-07T19:45:08.4291950Z 2025-05-07T19:45:08.4291954Z 2025-05-07T19:45:08.4291958Z 2025-05-07T19:45:08.4292120Z  2025-05-07T19:45:08.4292260Z 2025-05-07T19:45:08.4292263Z 2025-05-07T19:45:08.4292268Z 2025-05-07T19:45:08.4292271Z 2025-05-07T19:45:08.4292274Z 2025-05-07T19:45:08.4292277Z 2025-05-07T19:45:08.4292281Z 2025-05-07T19:45:08.4292421Z  2025-05-07T19:45:08.4292573Z 2025-05-07T19:45:08.4292577Z 2025-05-07T19:45:08.4292580Z 2025-05-07T19:45:08.4292583Z 2025-05-07T19:45:08.4292587Z 2025-05-07T19:45:08.4292604Z 2025-05-07T19:45:08.4292607Z 2025-05-07T19:45:08.4292610Z 2025-05-07T19:45:08.4292742Z  2025-05-07T19:45:08.4292929Z 2025-05-07T19:45:08.4292933Z 2025-05-07T19:45:08.4292936Z 2025-05-07T19:45:08.4292940Z 2025-05-07T19:45:08.4292943Z 2025-05-07T19:45:08.4292949Z 2025-05-07T19:45:08.4292956Z 2025-05-07T19:45:08.4292960Z 2025-05-07T19:45:08.4292963Z 2025-05-07T19:45:08.4293108Z  2025-05-07T19:45:08.4293303Z 2025-05-07T19:45:08.4293306Z 2025-05-07T19:45:08.4293310Z 2025-05-07T19:45:08.4293313Z 2025-05-07T19:45:08.4293316Z 2025-05-07T19:45:08.4293320Z 2025-05-07T19:45:08.4293323Z 2025-05-07T19:45:08.4293326Z 2025-05-07T19:45:08.4293329Z 2025-05-07T19:45:08.4293333Z 2025-05-07T19:45:08.4293512Z  2025-05-07T19:45:08.4293694Z 2025-05-07T19:45:08.4293697Z 2025-05-07T19:45:08.4293701Z 2025-05-07T19:45:08.4293705Z 2025-05-07T19:45:08.4293708Z 2025-05-07T19:45:08.4293711Z 2025-05-07T19:45:08.4293714Z 2025-05-07T19:45:08.4293720Z 2025-05-07T19:45:08.4293723Z 2025-05-07T19:45:08.4293727Z 2025-05-07T19:45:08.4293730Z 2025-05-07T19:45:08.4293925Z  2025-05-07T19:45:08.4294115Z 2025-05-07T19:45:08.4294118Z 2025-05-07T19:45:08.4294122Z 2025-05-07T19:45:08.4294126Z 2025-05-07T19:45:08.4294132Z 2025-05-07T19:45:08.4294135Z 2025-05-07T19:45:08.4294139Z 2025-05-07T19:45:08.4294142Z 2025-05-07T19:45:08.4294148Z 2025-05-07T19:45:08.4294152Z 2025-05-07T19:45:08.4294156Z 2025-05-07T19:45:08.4294159Z 2025-05-07T19:45:08.4294323Z  2025-05-07T19:45:08.4294532Z 2025-05-07T19:45:08.4294556Z 2025-05-07T19:45:08.4294560Z 2025-05-07T19:45:08.4294563Z 2025-05-07T19:45:08.4294566Z 2025-05-07T19:45:08.4294570Z 2025-05-07T19:45:08.4294573Z 2025-05-07T19:45:08.4294577Z 2025-05-07T19:45:08.4294582Z 2025-05-07T19:45:08.4294585Z 2025-05-07T19:45:08.4294605Z 2025-05-07T19:45:08.4294609Z 2025-05-07T19:45:08.4294612Z 2025-05-07T19:45:08.4294759Z  2025-05-07T19:45:08.4294967Z 2025-05-07T19:45:08.4294971Z 2025-05-07T19:45:08.4294974Z 2025-05-07T19:45:08.4294978Z 2025-05-07T19:45:08.4294981Z 2025-05-07T19:45:08.4294985Z 2025-05-07T19:45:08.4294988Z 2025-05-07T19:45:08.4294992Z 2025-05-07T19:45:08.4295015Z 2025-05-07T19:45:08.4295022Z 2025-05-07T19:45:08.4295026Z 2025-05-07T19:45:08.4295030Z 2025-05-07T19:45:08.4295094Z 2025-05-07T19:45:08.4295098Z 2025-05-07T19:45:08.4295254Z  2025-05-07T19:45:08.4295467Z 2025-05-07T19:45:08.4295471Z 2025-05-07T19:45:08.4295476Z 2025-05-07T19:45:08.4295480Z 2025-05-07T19:45:08.4295503Z 2025-05-07T19:45:08.4295506Z 2025-05-07T19:45:08.4295510Z 2025-05-07T19:45:08.4295513Z 2025-05-07T19:45:08.4295516Z 2025-05-07T19:45:08.4295520Z 2025-05-07T19:45:08.4295543Z 2025-05-07T19:45:08.4295546Z 2025-05-07T19:45:08.4295549Z 2025-05-07T19:45:08.4295553Z 2025-05-07T19:45:08.4295556Z 2025-05-07T19:45:08.4295715Z  2025-05-07T19:45:08.4295959Z 2025-05-07T19:45:08.4295962Z 2025-05-07T19:45:08.4295966Z 2025-05-07T19:45:08.4295969Z 2025-05-07T19:45:08.4295973Z 2025-05-07T19:45:08.4295977Z 2025-05-07T19:45:08.4295980Z 2025-05-07T19:45:08.4295983Z 2025-05-07T19:45:08.4295987Z 2025-05-07T19:45:08.4296048Z 2025-05-07T19:45:08.4296052Z 2025-05-07T19:45:08.4296055Z 2025-05-07T19:45:08.4296059Z 2025-05-07T19:45:08.4296065Z 2025-05-07T19:45:08.4296069Z 2025-05-07T19:45:08.4296072Z 2025-05-07T19:45:08.4296265Z  2025-05-07T19:45:08.4296490Z 2025-05-07T19:45:08.4296493Z 2025-05-07T19:45:08.4296497Z 2025-05-07T19:45:08.4296500Z 2025-05-07T19:45:08.4296503Z 2025-05-07T19:45:08.4296507Z 2025-05-07T19:45:08.4296510Z 2025-05-07T19:45:08.4296513Z 2025-05-07T19:45:08.4296517Z 2025-05-07T19:45:08.4296520Z 2025-05-07T19:45:08.4296524Z 2025-05-07T19:45:08.4296528Z 2025-05-07T19:45:08.4296532Z 2025-05-07T19:45:08.4296536Z 2025-05-07T19:45:08.4296561Z 2025-05-07T19:45:08.4296564Z 2025-05-07T19:45:08.4296568Z 2025-05-07T19:45:08.4296738Z  2025-05-07T19:45:08.4296966Z 2025-05-07T19:45:08.4296970Z 2025-05-07T19:45:08.4296973Z 2025-05-07T19:45:08.4296977Z 2025-05-07T19:45:08.4296981Z 2025-05-07T19:45:08.4296988Z 2025-05-07T19:45:08.4296992Z 2025-05-07T19:45:08.4296995Z 2025-05-07T19:45:08.4297022Z 2025-05-07T19:45:08.4297059Z 2025-05-07T19:45:08.4297063Z 2025-05-07T19:45:08.4297067Z 2025-05-07T19:45:08.4297071Z 2025-05-07T19:45:08.4297074Z 2025-05-07T19:45:08.4297078Z 2025-05-07T19:45:08.4297081Z 2025-05-07T19:45:08.4297084Z 2025-05-07T19:45:08.4297088Z 2025-05-07T19:45:08.4297263Z  2025-05-07T19:45:08.4297514Z 2025-05-07T19:45:08.4297518Z 2025-05-07T19:45:08.4297629Z  2025-05-07T19:45:08.4297746Z 2025-05-07T19:45:08.4297750Z 2025-05-07T19:45:08.4297871Z  2025-05-07T19:45:08.4297983Z 2025-05-07T19:45:08.4297986Z 2025-05-07T19:45:08.4297990Z 2025-05-07T19:45:08.4298094Z  2025-05-07T19:45:08.4298213Z 2025-05-07T19:45:08.4298235Z 2025-05-07T19:45:08.4298239Z 2025-05-07T19:45:08.4298243Z 2025-05-07T19:45:08.4298350Z  2025-05-07T19:45:08.4298473Z 2025-05-07T19:45:08.4298476Z 2025-05-07T19:45:08.4298483Z 2025-05-07T19:45:08.4298487Z 2025-05-07T19:45:08.4298530Z 2025-05-07T19:45:08.4298642Z  2025-05-07T19:45:08.4298776Z 2025-05-07T19:45:08.4298780Z 2025-05-07T19:45:08.4298783Z 2025-05-07T19:45:08.4298787Z 2025-05-07T19:45:08.4298791Z 2025-05-07T19:45:08.4298794Z 2025-05-07T19:45:08.4298930Z  2025-05-07T19:45:08.4299067Z 2025-05-07T19:45:08.4299071Z 2025-05-07T19:45:08.4299074Z 2025-05-07T19:45:08.4299078Z 2025-05-07T19:45:08.4299081Z 2025-05-07T19:45:08.4299084Z 2025-05-07T19:45:08.4299087Z 2025-05-07T19:45:08.4299205Z  2025-05-07T19:45:08.4299371Z 2025-05-07T19:45:08.4299375Z 2025-05-07T19:45:08.4299381Z 2025-05-07T19:45:08.4299384Z 2025-05-07T19:45:08.4299387Z 2025-05-07T19:45:08.4299391Z 2025-05-07T19:45:08.4299394Z 2025-05-07T19:45:08.4299398Z 2025-05-07T19:45:08.4299648Z  2025-05-07T19:45:08.4299830Z 2025-05-07T19:45:08.4299834Z 2025-05-07T19:45:08.4299837Z 2025-05-07T19:45:08.4299845Z 2025-05-07T19:45:08.4299849Z 2025-05-07T19:45:08.4299852Z 2025-05-07T19:45:08.4299855Z 2025-05-07T19:45:08.4299987Z 2025-05-07T19:45:08.4299991Z 2025-05-07T19:45:08.4300150Z  2025-05-07T19:45:08.4300819Z 2025-05-07T19:45:08.4300823Z 2025-05-07T19:45:08.4300827Z 2025-05-07T19:45:08.4300830Z 2025-05-07T19:45:08.4300834Z 2025-05-07T19:45:08.4300837Z 2025-05-07T19:45:08.4300840Z 2025-05-07T19:45:08.4300846Z 2025-05-07T19:45:08.4300849Z 2025-05-07T19:45:08.4300852Z 2025-05-07T19:45:08.4300997Z  2025-05-07T19:45:08.4301179Z 2025-05-07T19:45:08.4301328Z 2025-05-07T19:45:08.4301332Z 2025-05-07T19:45:08.4301387Z 2025-05-07T19:45:08.4301390Z 2025-05-07T19:45:08.4301393Z 2025-05-07T19:45:08.4301397Z 2025-05-07T19:45:08.4301400Z 2025-05-07T19:45:08.4301404Z 2025-05-07T19:45:08.4301407Z 2025-05-07T19:45:08.4301411Z 2025-05-07T19:45:08.4301559Z  2025-05-07T19:45:08.4301784Z 2025-05-07T19:45:08.4301787Z 2025-05-07T19:45:08.4301933Z 2025-05-07T19:45:08.4301938Z 2025-05-07T19:45:08.4301941Z 2025-05-07T19:45:08.4301950Z 2025-05-07T19:45:08.4301953Z 2025-05-07T19:45:08.4301956Z 2025-05-07T19:45:08.4301960Z 2025-05-07T19:45:08.4301963Z 2025-05-07T19:45:08.4301967Z 2025-05-07T19:45:08.4301970Z 2025-05-07T19:45:08.4302151Z  2025-05-07T19:45:08.4302359Z 2025-05-07T19:45:08.4302363Z 2025-05-07T19:45:08.4302366Z 2025-05-07T19:45:08.4302370Z 2025-05-07T19:45:08.4302373Z 2025-05-07T19:45:08.4302376Z 2025-05-07T19:45:08.4302379Z 2025-05-07T19:45:08.4302382Z 2025-05-07T19:45:08.4302386Z 2025-05-07T19:45:08.4302390Z 2025-05-07T19:45:08.4302393Z 2025-05-07T19:45:08.4302396Z 2025-05-07T19:45:08.4302400Z 2025-05-07T19:45:08.4302584Z  2025-05-07T19:45:08.4302797Z 2025-05-07T19:45:08.4302801Z 2025-05-07T19:45:08.4302804Z 2025-05-07T19:45:08.4302807Z 2025-05-07T19:45:08.4302811Z 2025-05-07T19:45:08.4302814Z 2025-05-07T19:45:08.4302821Z 2025-05-07T19:45:08.4302825Z 2025-05-07T19:45:08.4302829Z 2025-05-07T19:45:08.4302832Z 2025-05-07T19:45:08.4302838Z 2025-05-07T19:45:08.4302842Z 2025-05-07T19:45:08.4302845Z 2025-05-07T19:45:08.4302848Z 2025-05-07T19:45:08.4303035Z  2025-05-07T19:45:08.4303257Z 2025-05-07T19:45:08.4303261Z 2025-05-07T19:45:08.4303264Z 2025-05-07T19:45:08.4303268Z 2025-05-07T19:45:08.4303271Z 2025-05-07T19:45:08.4303274Z 2025-05-07T19:45:08.4303278Z 2025-05-07T19:45:08.4303281Z 2025-05-07T19:45:08.4303285Z 2025-05-07T19:45:08.4303289Z 2025-05-07T19:45:08.4303292Z 2025-05-07T19:45:08.4303322Z 2025-05-07T19:45:08.4303325Z 2025-05-07T19:45:08.4303329Z 2025-05-07T19:45:08.4303332Z 2025-05-07T19:45:08.4303498Z  2025-05-07T19:45:08.4303725Z 2025-05-07T19:45:08.4303729Z 2025-05-07T19:45:08.4303732Z 2025-05-07T19:45:08.4303736Z 2025-05-07T19:45:08.4303739Z 2025-05-07T19:45:08.4303743Z 2025-05-07T19:45:08.4303778Z 2025-05-07T19:45:08.4303782Z 2025-05-07T19:45:08.4303785Z 2025-05-07T19:45:08.4303788Z 2025-05-07T19:45:08.4303795Z 2025-05-07T19:45:08.4303798Z 2025-05-07T19:45:08.4303802Z 2025-05-07T19:45:08.4303805Z 2025-05-07T19:45:08.4303808Z 2025-05-07T19:45:08.4303811Z 2025-05-07T19:45:08.4303979Z  2025-05-07T19:45:08.4304206Z 2025-05-07T19:45:08.4304238Z 2025-05-07T19:45:08.4304241Z 2025-05-07T19:45:08.4304245Z 2025-05-07T19:45:08.4304248Z 2025-05-07T19:45:08.4304252Z 2025-05-07T19:45:08.4304255Z 2025-05-07T19:45:08.4304258Z 2025-05-07T19:45:08.4304261Z 2025-05-07T19:45:08.4304265Z 2025-05-07T19:45:08.4304268Z 2025-05-07T19:45:08.4304271Z 2025-05-07T19:45:08.4304274Z 2025-05-07T19:45:08.4304278Z 2025-05-07T19:45:08.4304281Z 2025-05-07T19:45:08.4304285Z 2025-05-07T19:45:08.4304289Z 2025-05-07T19:45:08.4304466Z  2025-05-07T19:45:08.4304726Z 2025-05-07T19:45:08.4304729Z 2025-05-07T19:45:08.4304736Z 2025-05-07T19:45:08.4304740Z 2025-05-07T19:45:08.4304743Z 2025-05-07T19:45:08.4304746Z 2025-05-07T19:45:08.4304866Z 2025-05-07T19:45:08.4304869Z 2025-05-07T19:45:08.4304873Z 2025-05-07T19:45:08.4304876Z 2025-05-07T19:45:08.4304880Z 2025-05-07T19:45:08.4304883Z 2025-05-07T19:45:08.4304886Z 2025-05-07T19:45:08.4304889Z 2025-05-07T19:45:08.4304893Z 2025-05-07T19:45:08.4304896Z 2025-05-07T19:45:08.4304899Z 2025-05-07T19:45:08.4304903Z 2025-05-07T19:45:08.4305100Z  2025-05-07T19:45:08.4305328Z 2025-05-07T19:45:08.4305332Z 2025-05-07T19:45:08.4305436Z  2025-05-07T19:45:08.4305587Z 2025-05-07T19:45:08.4305590Z 2025-05-07T19:45:08.4305699Z  2025-05-07T19:45:08.4305816Z 2025-05-07T19:45:08.4305819Z 2025-05-07T19:45:08.4305823Z 2025-05-07T19:45:08.4305955Z  2025-05-07T19:45:08.4306074Z 2025-05-07T19:45:08.4306077Z 2025-05-07T19:45:08.4306081Z 2025-05-07T19:45:08.4306085Z 2025-05-07T19:45:08.4306208Z  2025-05-07T19:45:08.4306419Z 2025-05-07T19:45:08.4306424Z 2025-05-07T19:45:08.4306428Z 2025-05-07T19:45:08.4306435Z 2025-05-07T19:45:08.4306439Z 2025-05-07T19:45:08.4306574Z  done 2025-05-07T19:45:08.7427081Z Preparing transaction: | / - done 2025-05-07T19:45:12.6099007Z Verifying transaction: | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - done 2025-05-07T19:45:15.4212632Z Executing transaction: | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ done 2025-05-07T19:45:15.8334841Z [INSTALL] Adding symlink librhash.so.0, which is needed by CMake ... 2025-05-07T19:45:17.6720318Z + ln -s /github/home/miniconda/envs/build_binary/lib/librhash.so /github/home/miniconda/envs/build_binary/lib/librhash.so.0 2025-05-07T19:45:17.6721265Z 2025-05-07T19:45:17.6739039Z 2025-05-07T19:45:17.6765923Z [EXEC] [ATTEMPT 0/3] + conda run -n build_binary pip install build 2025-05-07T19:45:20.0507655Z WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager, possibly rendering your system unusable. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv. Use the --root-user-action option if you know what you are doing and want to suppress this warning. 2025-05-07T19:45:20.0509729Z 2025-05-07T19:45:20.0509952Z Collecting build 2025-05-07T19:45:20.0510346Z Downloading build-1.2.2.post1-py3-none-any.whl.metadata (6.5 kB) 2025-05-07T19:45:20.0511179Z Requirement already satisfied: packaging>=19.1 in /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages (from build) (25.0) 2025-05-07T19:45:20.0511878Z Collecting pyproject_hooks (from build) 2025-05-07T19:45:20.0512317Z Downloading pyproject_hooks-1.2.0-py3-none-any.whl.metadata (1.3 kB) 2025-05-07T19:45:20.0512799Z Downloading build-1.2.2.post1-py3-none-any.whl (22 kB) 2025-05-07T19:45:20.0513265Z Downloading pyproject_hooks-1.2.0-py3-none-any.whl (10 kB) 2025-05-07T19:45:20.0513693Z Installing collected packages: pyproject_hooks, build 2025-05-07T19:45:20.0513978Z 2025-05-07T19:45:20.0514175Z Successfully installed build-1.2.2.post1 pyproject_hooks-1.2.0 2025-05-07T19:45:20.0514467Z 2025-05-07T19:45:21.9333687Z /github/home/miniconda/envs/build_binary/bin/make 2025-05-07T19:45:21.9334053Z 2025-05-07T19:45:22.0087023Z [CHECK] Binary make found in PATH 2025-05-07T19:45:23.8248855Z /github/home/miniconda/envs/build_binary/bin/cmake 2025-05-07T19:45:23.8249199Z 2025-05-07T19:45:23.9026810Z [CHECK] Binary cmake found in PATH 2025-05-07T19:45:25.7045155Z /github/home/miniconda/envs/build_binary/bin/ninja 2025-05-07T19:45:25.7046034Z 2025-05-07T19:45:25.7629573Z [CHECK] Binary ninja found in PATH 2025-05-07T19:45:27.6502472Z [CHECK] Python (sub-)package 'click' found ... 2025-05-07T19:45:29.6848848Z [CHECK] Python (sub-)package 'hypothesis' found ... 2025-05-07T19:45:31.6281512Z [CHECK] Python (sub-)package 'jinja2' found ... 2025-05-07T19:45:33.6324264Z [CHECK] Python (sub-)package 'skbuild' found ... 2025-05-07T19:45:35.5083691Z [CHECK] Python (sub-)package 'wheel' found ... 2025-05-07T19:45:35.5084958Z [INSTALL] Successfully installed all the build tools 2025-05-07T19:45:35.5167339Z ##[group]Run . $PRELUDE; install_cuda $BUILD_ENV 12.8.0 2025-05-07T19:45:35.5167878Z . $PRELUDE; install_cuda $BUILD_ENV 12.8.0 2025-05-07T19:45:35.5168555Z shell: bash --noprofile --norc -e -o pipefail {0} 2025-05-07T19:45:35.5168940Z env: 2025-05-07T19:45:35.5169190Z PRELUDE: .github/scripts/setup_env.bash 2025-05-07T19:45:35.5169549Z BUILD_ENV: build_binary 2025-05-07T19:45:35.5169818Z BUILD_TARGET: genai 2025-05-07T19:45:35.5170088Z BUILD_VARIANT: cuda 2025-05-07T19:45:35.5170394Z BUILD_CUDA_VERSION: 12.8.0 2025-05-07T19:45:35.5170677Z ##[endgroup] 2025-05-07T19:45:35.9238691Z ################################################################################ 2025-05-07T19:45:35.9239101Z # Install CUDA 2025-05-07T19:45:35.9239344Z # 2025-05-07T19:45:35.9263080Z # [2025-05-07T19:45:35.925Z] + install_cuda build_binary 12.8.0 2025-05-07T19:45:35.9265140Z ################################################################################ 2025-05-07T19:45:35.9266445Z 2025-05-07T19:45:35.9287736Z [EXEC] [ATTEMPT 0/3] + wget -q --timeout 1 pypi.org -O /dev/null 2025-05-07T19:45:36.0197752Z [CHECK] Network does not appear to be blocked. 2025-05-07T19:45:36.0198263Z [SETUP] Cleaning up Conda packages ... 2025-05-07T19:45:36.0200112Z + conda clean --packages --tarball -y 2025-05-07T19:45:36.0200555Z 2025-05-07T19:45:36.4843634Z Will remove 130 (465.2 MB) tarball(s). 2025-05-07T19:45:36.4844076Z Will remove 14 (1.7 MB) package(s). 2025-05-07T19:45:36.5378194Z 2025-05-07T19:45:36.5388870Z + conda clean --all -y 2025-05-07T19:45:36.5389439Z 2025-05-07T19:45:37.1534347Z There are no unused tarball(s) to remove. 2025-05-07T19:45:37.1534873Z Will remove 1 index cache(s). 2025-05-07T19:45:37.1535192Z There are no unused package(s) to remove. 2025-05-07T19:45:37.1535557Z There are no tempfile(s) to remove. 2025-05-07T19:45:37.1536040Z There are no logfile(s) to remove. 2025-05-07T19:45:37.2096285Z 2025-05-07T19:45:37.2116126Z [INSTALL] Installing CUDA 12.8.0 ... 2025-05-07T19:45:37.2143577Z [EXEC] [ATTEMPT 0/3] + conda install --force-reinstall -n build_binary -c conda-forge --override-channels -y cuda=12.8.0 2025-05-07T19:45:38.0480539Z Channels: 2025-05-07T19:45:38.0480796Z - conda-forge 2025-05-07T19:45:38.0481048Z Platform: linux-64 2025-05-07T19:45:47.6444947Z Collecting package metadata (repodata.json): - \ | / - \ | / - \ | / - \ | / done 2025-05-07T19:45:49.1296625Z Solving environment: \ | / - done 2025-05-07T19:45:49.2578374Z 2025-05-07T19:45:49.2578777Z ## Package Plan ## 2025-05-07T19:45:49.2578985Z 2025-05-07T19:45:49.2579204Z environment location: /github/home/miniconda/envs/build_binary 2025-05-07T19:45:49.2579624Z 2025-05-07T19:45:49.2579729Z added / updated specs: 2025-05-07T19:45:49.2580008Z - cuda=12.8.0 2025-05-07T19:45:49.2580150Z 2025-05-07T19:45:49.2580155Z 2025-05-07T19:45:49.2580322Z The following packages will be downloaded: 2025-05-07T19:45:49.2580554Z 2025-05-07T19:45:49.2580706Z package | build 2025-05-07T19:45:49.2581055Z ---------------------------|----------------- 2025-05-07T19:45:49.2581427Z attr-2.5.1 | h166bdaf_1 69 KB conda-forge 2025-05-07T19:45:49.2581881Z binutils-2.40 | h4852527_7 31 KB conda-forge 2025-05-07T19:45:49.2582353Z c-compiler-1.5.2 | h0b41bf4_0 6 KB conda-forge 2025-05-07T19:45:49.2582790Z cuda-12.8.0 | ha804496_0 26 KB conda-forge 2025-05-07T19:45:49.2583262Z cuda-cccl_linux-64-12.8.55 | ha770c72_1 1.0 MB conda-forge 2025-05-07T19:45:49.2583801Z cuda-command-line-tools-12.8.0| ha770c72_0 20 KB conda-forge 2025-05-07T19:45:49.2584341Z cuda-compiler-12.8.0 | hbad6d8a_0 20 KB conda-forge 2025-05-07T19:45:49.2585155Z cuda-crt-dev_linux-64-12.8.61| ha770c72_1 90 KB conda-forge 2025-05-07T19:45:49.2585844Z cuda-crt-tools-12.8.61 | ha770c72_1 27 KB conda-forge 2025-05-07T19:45:49.2586347Z cuda-cudart-12.8.57 | h5888daf_1 22 KB conda-forge 2025-05-07T19:45:49.2586830Z cuda-cudart-dev-12.8.57 | h5888daf_1 23 KB conda-forge 2025-05-07T19:45:49.2587377Z cuda-cudart-dev_linux-64-12.8.57| h3f2d84a_1 377 KB conda-forge 2025-05-07T19:45:49.2587919Z cuda-cudart-static-12.8.57 | h5888daf_1 22 KB conda-forge 2025-05-07T19:45:49.2588493Z cuda-cudart-static_linux-64-12.8.57| h3f2d84a_1 950 KB conda-forge 2025-05-07T19:45:49.2589056Z cuda-cudart_linux-64-12.8.57| h3f2d84a_1 188 KB conda-forge 2025-05-07T19:45:49.2589563Z cuda-cuobjdump-12.8.55 | hbd13f7d_0 227 KB conda-forge 2025-05-07T19:45:49.2590061Z cuda-cupti-12.8.57 | hbd13f7d_0 1.8 MB conda-forge 2025-05-07T19:45:49.2590617Z cuda-cupti-dev-12.8.57 | h5888daf_0 4.0 MB conda-forge 2025-05-07T19:45:49.2591104Z cuda-cuxxfilt-12.8.55 | hbd13f7d_0 211 KB conda-forge 2025-05-07T19:45:49.2591612Z cuda-driver-dev-12.8.57 | h5888daf_1 22 KB conda-forge 2025-05-07T19:45:49.2592153Z cuda-driver-dev_linux-64-12.8.90| h3f2d84a_1 36 KB conda-forge 2025-05-07T19:45:49.2592651Z cuda-gdb-12.8.55 | h50b4baa_0 353 KB conda-forge 2025-05-07T19:45:49.2593137Z cuda-libraries-12.8.0 | ha770c72_0 20 KB conda-forge 2025-05-07T19:45:49.2593642Z cuda-libraries-dev-12.8.0 | ha770c72_0 20 KB conda-forge 2025-05-07T19:45:49.2594154Z cuda-nsight-12.8.55 | h7938cbb_0 113.2 MB conda-forge 2025-05-07T19:45:49.2594626Z cuda-nvcc-12.8.61 | hcdd1206_0 23 KB conda-forge 2025-05-07T19:45:49.2595118Z cuda-nvcc-dev_linux-64-12.8.61| he91c749_1 12.7 MB conda-forge 2025-05-07T19:45:49.2595640Z cuda-nvcc-impl-12.8.61 | h85509e4_1 25 KB conda-forge 2025-05-07T19:45:49.2596125Z cuda-nvcc-tools-12.8.61 | he02047a_1 24.5 MB conda-forge 2025-05-07T19:45:49.2596634Z cuda-nvcc_linux-64-12.8.61 | h04802cd_0 25 KB conda-forge 2025-05-07T19:45:49.2597129Z cuda-nvdisasm-12.8.55 | hbd13f7d_0 4.9 MB conda-forge 2025-05-07T19:45:49.2597624Z cuda-nvml-dev-12.8.55 | hbd13f7d_0 134 KB conda-forge 2025-05-07T19:45:49.2598110Z cuda-nvprof-12.8.57 | hbd13f7d_0 2.5 MB conda-forge 2025-05-07T19:45:49.2598585Z cuda-nvprune-12.8.55 | hbd13f7d_0 68 KB conda-forge 2025-05-07T19:45:49.2599066Z cuda-nvrtc-12.8.61 | hbd13f7d_0 63.1 MB conda-forge 2025-05-07T19:45:49.2599538Z cuda-nvrtc-dev-12.8.61 | h5888daf_0 34 KB conda-forge 2025-05-07T19:45:49.2600028Z cuda-nvtx-12.8.55 | hbd13f7d_0 31 KB conda-forge 2025-05-07T19:45:49.2600673Z cuda-nvvm-dev_linux-64-12.8.61| ha770c72_1 25 KB conda-forge 2025-05-07T19:45:49.2601198Z cuda-nvvm-impl-12.8.61 | he02047a_1 20.8 MB conda-forge 2025-05-07T19:45:49.2601703Z cuda-nvvm-tools-12.8.61 | he02047a_1 23.5 MB conda-forge 2025-05-07T19:45:49.2602174Z cuda-nvvp-12.8.57 | hbd13f7d_0 112.4 MB conda-forge 2025-05-07T19:45:49.2602652Z cuda-opencl-12.8.55 | hbd13f7d_0 29 KB conda-forge 2025-05-07T19:45:49.2603139Z cuda-opencl-dev-12.8.55 | h5888daf_0 95 KB conda-forge 2025-05-07T19:45:49.2603664Z cuda-profiler-api-12.8.55 | h7938cbb_0 22 KB conda-forge 2025-05-07T19:45:49.2604177Z cuda-runtime-12.8.0 | ha804496_0 20 KB conda-forge 2025-05-07T19:45:49.2604782Z cuda-sanitizer-api-12.8.55 | hbd13f7d_0 8.8 MB conda-forge 2025-05-07T19:45:49.2605411Z cuda-toolkit-12.8.0 | ha804496_0 20 KB conda-forge 2025-05-07T19:45:49.2605879Z cuda-tools-12.8.0 | ha770c72_0 19 KB conda-forge 2025-05-07T19:45:49.2606358Z cuda-version-12.8 | h5d125a7_3 21 KB conda-forge 2025-05-07T19:45:49.2606854Z cuda-visual-tools-12.8.0 | ha770c72_0 20 KB conda-forge 2025-05-07T19:45:49.2607364Z cxx-compiler-1.5.2 | hf52228f_0 6 KB conda-forge 2025-05-07T19:45:49.2607827Z dbus-1.13.6 | h5008d03_3 604 KB conda-forge 2025-05-07T19:45:49.2608235Z gcc-11.4.0 | h602e360_13 49 KB conda-forge 2025-05-07T19:45:49.2608684Z gds-tools-1.13.0.11 | h5888daf_0 37.9 MB conda-forge 2025-05-07T19:45:49.2609111Z gmp-6.3.0 | hac33072_2 449 KB conda-forge 2025-05-07T19:45:49.2609528Z gxx-11.4.0 | h602e360_13 49 KB conda-forge 2025-05-07T19:45:49.2609959Z libcap-2.75 | h39aace5_0 118 KB conda-forge 2025-05-07T19:45:49.2610427Z libcublas-12.8.3.14 | h9ab20c4_0 460.2 MB conda-forge 2025-05-07T19:45:49.2610922Z libcublas-dev-12.8.3.14 | h9ab20c4_0 89 KB conda-forge 2025-05-07T19:45:49.2611397Z libcufft-11.3.3.41 | hbd13f7d_0 147.4 MB conda-forge 2025-05-07T19:45:49.2611879Z libcufft-dev-11.3.3.41 | h5888daf_0 33 KB conda-forge 2025-05-07T19:45:49.2612351Z libcufile-1.13.0.11 | h12f29b5_0 939 KB conda-forge 2025-05-07T19:45:49.2612846Z libcufile-dev-1.13.0.11 | h5888daf_0 35 KB conda-forge 2025-05-07T19:45:49.2613340Z libcurand-10.3.9.55 | hbd13f7d_0 43.6 MB conda-forge 2025-05-07T19:45:49.2613820Z libcurand-dev-10.3.9.55 | h5888daf_0 265 KB conda-forge 2025-05-07T19:45:49.2614329Z libcusolver-11.7.2.55 | h9ab20c4_0 156.9 MB conda-forge 2025-05-07T19:45:49.2614928Z libcusolver-dev-11.7.2.55 | h9ab20c4_0 59 KB conda-forge 2025-05-07T19:45:49.2615429Z libcusparse-12.5.7.53 | hbd13f7d_0 164.9 MB conda-forge 2025-05-07T19:45:49.2615910Z libcusparse-dev-12.5.7.53 | h5888daf_0 51 KB conda-forge 2025-05-07T19:45:49.2616411Z libgcrypt-lib-1.11.0 | hb9d3cd8_2 572 KB conda-forge 2025-05-07T19:45:49.2616883Z libglvnd-1.7.0 | ha4b6fd6_2 129 KB conda-forge 2025-05-07T19:45:49.2617330Z libgpg-error-1.55 | h3f2d84a_0 305 KB conda-forge 2025-05-07T19:45:49.2617781Z libnl-3.11.0 | hb9d3cd8_0 724 KB conda-forge 2025-05-07T19:45:49.2618203Z libnpp-12.3.3.65 | hbd13f7d_0 130.6 MB conda-forge 2025-05-07T19:45:49.2618667Z libnpp-dev-12.3.3.65 | h5888daf_0 443 KB conda-forge 2025-05-07T19:45:49.2619126Z libnuma-2.0.18 | h4ab18f5_2 42 KB conda-forge 2025-05-07T19:45:49.2619637Z libnvfatbin-12.8.55 | hbd13f7d_0 793 KB conda-forge 2025-05-07T19:45:49.2620300Z libnvfatbin-dev-12.8.55 | h5888daf_0 26 KB conda-forge 2025-05-07T19:45:49.2620795Z libnvjitlink-12.8.61 | hbd13f7d_0 28.7 MB conda-forge 2025-05-07T19:45:49.2621308Z libnvjitlink-dev-12.8.61 | h5888daf_0 25 KB conda-forge 2025-05-07T19:45:49.2621795Z libnvjpeg-12.3.5.57 | h97fd463_0 3.0 MB conda-forge 2025-05-07T19:45:49.2622285Z libnvjpeg-dev-12.3.5.57 | ha770c72_0 31 KB conda-forge 2025-05-07T19:45:49.2622774Z libopengl-1.7.0 | ha4b6fd6_2 50 KB conda-forge 2025-05-07T19:45:49.2623238Z libsystemd0-257.4 | h4e0b6ca_1 477 KB conda-forge 2025-05-07T19:45:49.2623845Z libudev1-257.4 | hbe16f8c_1 141 KB conda-forge 2025-05-07T19:45:49.2624306Z libxkbcommon-1.9.2 | h65c71a3_0 660 KB conda-forge 2025-05-07T19:45:49.2624790Z libxkbfile-1.1.0 | h166bdaf_1 111 KB conda-forge 2025-05-07T19:45:49.2625238Z libxml2-2.13.8 | h4bc477f_0 675 KB conda-forge 2025-05-07T19:45:49.2625672Z lz4-c-1.10.0 | h5888daf_1 163 KB conda-forge 2025-05-07T19:45:49.2626161Z nsight-compute-2025.1.0.14 | hb5ebaad_0 320.6 MB conda-forge 2025-05-07T19:45:49.2626627Z nspr-4.36 | h5888daf_0 225 KB conda-forge 2025-05-07T19:45:49.2627048Z nss-3.111 | h159eef7_0 1.9 MB conda-forge 2025-05-07T19:45:49.2627462Z ocl-icd-2.3.3 | hb9d3cd8_0 104 KB conda-forge 2025-05-07T19:45:49.2627952Z opencl-headers-2024.10.24 | h5888daf_0 53 KB conda-forge 2025-05-07T19:45:49.2628449Z rdma-core-57.0 | h5888daf_0 1.2 MB conda-forge 2025-05-07T19:45:49.2628889Z wayland-1.23.1 | h3e06ad9_0 314 KB conda-forge 2025-05-07T19:45:49.2629332Z xcb-util-0.4.1 | hb711507_2 19 KB conda-forge 2025-05-07T19:45:49.2629793Z xcb-util-cursor-0.1.5 | hb9d3cd8_0 20 KB conda-forge 2025-05-07T19:45:49.2630291Z xcb-util-image-0.4.0 | hb711507_2 24 KB conda-forge 2025-05-07T19:45:49.2630777Z xcb-util-keysyms-0.4.1 | hb711507_0 14 KB conda-forge 2025-05-07T19:45:49.2631303Z xcb-util-renderutil-0.3.10 | hb711507_0 17 KB conda-forge 2025-05-07T19:45:49.2631805Z xcb-util-wm-0.4.2 | hb711507_0 50 KB conda-forge 2025-05-07T19:45:49.2632376Z xkeyboard-config-2.44 | hb9d3cd8_0 384 KB conda-forge 2025-05-07T19:45:49.2632882Z xorg-libxcomposite-0.4.6 | hb9d3cd8_2 13 KB conda-forge 2025-05-07T19:45:49.2633358Z xorg-libxdamage-1.1.6 | hb9d3cd8_0 13 KB conda-forge 2025-05-07T19:45:49.2633787Z ------------------------------------------------------------ 2025-05-07T19:45:49.2634118Z Total: 1.86 GB 2025-05-07T19:45:49.2634339Z 2025-05-07T19:45:49.2634465Z The following NEW packages will be INSTALLED: 2025-05-07T19:45:49.2634685Z 2025-05-07T19:45:49.2634862Z attr conda-forge/linux-64::attr-2.5.1-h166bdaf_1 2025-05-07T19:45:49.2635272Z binutils conda-forge/linux-64::binutils-2.40-h4852527_7 2025-05-07T19:45:49.2635737Z c-compiler conda-forge/linux-64::c-compiler-1.5.2-h0b41bf4_0 2025-05-07T19:45:49.2636159Z cuda conda-forge/noarch::cuda-12.8.0-ha804496_0 2025-05-07T19:45:49.2636636Z cuda-cccl_linux-64 conda-forge/noarch::cuda-cccl_linux-64-12.8.55-ha770c72_1 2025-05-07T19:45:49.2637247Z cuda-command-line~ conda-forge/linux-64::cuda-command-line-tools-12.8.0-ha770c72_0 2025-05-07T19:45:49.2637820Z cuda-compiler conda-forge/noarch::cuda-compiler-12.8.0-hbad6d8a_0 2025-05-07T19:45:49.2638377Z cuda-crt-dev_linu~ conda-forge/noarch::cuda-crt-dev_linux-64-12.8.61-ha770c72_1 2025-05-07T19:45:49.2638929Z cuda-crt-tools conda-forge/linux-64::cuda-crt-tools-12.8.61-ha770c72_1 2025-05-07T19:45:49.2639458Z cuda-cudart conda-forge/linux-64::cuda-cudart-12.8.57-h5888daf_1 2025-05-07T19:45:49.2639988Z cuda-cudart-dev conda-forge/linux-64::cuda-cudart-dev-12.8.57-h5888daf_1 2025-05-07T19:45:49.2640562Z cuda-cudart-dev_l~ conda-forge/noarch::cuda-cudart-dev_linux-64-12.8.57-h3f2d84a_1 2025-05-07T19:45:49.2641178Z cuda-cudart-static conda-forge/linux-64::cuda-cudart-static-12.8.57-h5888daf_1 2025-05-07T19:45:49.2641795Z cuda-cudart-stati~ conda-forge/noarch::cuda-cudart-static_linux-64-12.8.57-h3f2d84a_1 2025-05-07T19:45:49.2642480Z cuda-cudart_linux~ conda-forge/noarch::cuda-cudart_linux-64-12.8.57-h3f2d84a_1 2025-05-07T19:45:49.2643130Z cuda-cuobjdump conda-forge/linux-64::cuda-cuobjdump-12.8.55-hbd13f7d_0 2025-05-07T19:45:49.2643642Z cuda-cupti conda-forge/linux-64::cuda-cupti-12.8.57-hbd13f7d_0 2025-05-07T19:45:49.2644159Z cuda-cupti-dev conda-forge/linux-64::cuda-cupti-dev-12.8.57-h5888daf_0 2025-05-07T19:45:49.2644685Z cuda-cuxxfilt conda-forge/linux-64::cuda-cuxxfilt-12.8.55-hbd13f7d_0 2025-05-07T19:45:49.2645236Z cuda-driver-dev conda-forge/linux-64::cuda-driver-dev-12.8.57-h5888daf_1 2025-05-07T19:45:49.2645826Z cuda-driver-dev_l~ conda-forge/noarch::cuda-driver-dev_linux-64-12.8.90-h3f2d84a_1 2025-05-07T19:45:49.2646350Z cuda-gdb conda-forge/linux-64::cuda-gdb-12.8.55-h50b4baa_0 2025-05-07T19:45:49.2646852Z cuda-libraries conda-forge/linux-64::cuda-libraries-12.8.0-ha770c72_0 2025-05-07T19:45:49.2647427Z cuda-libraries-dev conda-forge/linux-64::cuda-libraries-dev-12.8.0-ha770c72_0 2025-05-07T19:45:49.2647974Z cuda-nsight conda-forge/linux-64::cuda-nsight-12.8.55-h7938cbb_0 2025-05-07T19:45:49.2648461Z cuda-nvcc conda-forge/linux-64::cuda-nvcc-12.8.61-hcdd1206_0 2025-05-07T19:45:49.2648977Z cuda-nvcc-dev_lin~ conda-forge/noarch::cuda-nvcc-dev_linux-64-12.8.61-he91c749_1 2025-05-07T19:45:49.2649548Z cuda-nvcc-impl conda-forge/linux-64::cuda-nvcc-impl-12.8.61-h85509e4_1 2025-05-07T19:45:49.2650105Z cuda-nvcc-tools conda-forge/linux-64::cuda-nvcc-tools-12.8.61-he02047a_1 2025-05-07T19:45:49.2650658Z cuda-nvcc_linux-64 conda-forge/linux-64::cuda-nvcc_linux-64-12.8.61-h04802cd_0 2025-05-07T19:45:49.2651213Z cuda-nvdisasm conda-forge/linux-64::cuda-nvdisasm-12.8.55-hbd13f7d_0 2025-05-07T19:45:49.2651732Z cuda-nvml-dev conda-forge/linux-64::cuda-nvml-dev-12.8.55-hbd13f7d_0 2025-05-07T19:45:49.2652254Z cuda-nvprof conda-forge/linux-64::cuda-nvprof-12.8.57-hbd13f7d_0 2025-05-07T19:45:49.2652774Z cuda-nvprune conda-forge/linux-64::cuda-nvprune-12.8.55-hbd13f7d_0 2025-05-07T19:45:49.2653268Z cuda-nvrtc conda-forge/linux-64::cuda-nvrtc-12.8.61-hbd13f7d_0 2025-05-07T19:45:49.2653784Z cuda-nvrtc-dev conda-forge/linux-64::cuda-nvrtc-dev-12.8.61-h5888daf_0 2025-05-07T19:45:49.2654275Z cuda-nvtx conda-forge/linux-64::cuda-nvtx-12.8.55-hbd13f7d_0 2025-05-07T19:45:49.2654807Z cuda-nvvm-dev_lin~ conda-forge/noarch::cuda-nvvm-dev_linux-64-12.8.61-ha770c72_1 2025-05-07T19:45:49.2655376Z cuda-nvvm-impl conda-forge/linux-64::cuda-nvvm-impl-12.8.61-he02047a_1 2025-05-07T19:45:49.2655912Z cuda-nvvm-tools conda-forge/linux-64::cuda-nvvm-tools-12.8.61-he02047a_1 2025-05-07T19:45:49.2656431Z cuda-nvvp conda-forge/linux-64::cuda-nvvp-12.8.57-hbd13f7d_0 2025-05-07T19:45:49.2656901Z cuda-opencl conda-forge/linux-64::cuda-opencl-12.8.55-hbd13f7d_0 2025-05-07T19:45:49.2657454Z cuda-opencl-dev conda-forge/linux-64::cuda-opencl-dev-12.8.55-h5888daf_0 2025-05-07T19:45:49.2658048Z cuda-profiler-api conda-forge/linux-64::cuda-profiler-api-12.8.55-h7938cbb_0 2025-05-07T19:45:49.2658587Z cuda-runtime conda-forge/noarch::cuda-runtime-12.8.0-ha804496_0 2025-05-07T19:45:49.2659148Z cuda-sanitizer-api conda-forge/linux-64::cuda-sanitizer-api-12.8.55-hbd13f7d_0 2025-05-07T19:45:49.2659773Z cuda-toolkit conda-forge/noarch::cuda-toolkit-12.8.0-ha804496_0 2025-05-07T19:45:49.2660470Z cuda-tools conda-forge/linux-64::cuda-tools-12.8.0-ha770c72_0 2025-05-07T19:45:49.2661000Z cuda-version conda-forge/noarch::cuda-version-12.8-h5d125a7_3 2025-05-07T19:45:49.2661571Z cuda-visual-tools conda-forge/linux-64::cuda-visual-tools-12.8.0-ha770c72_0 2025-05-07T19:45:49.2662178Z cxx-compiler conda-forge/linux-64::cxx-compiler-1.5.2-hf52228f_0 2025-05-07T19:45:49.2662661Z dbus conda-forge/linux-64::dbus-1.13.6-h5008d03_3 2025-05-07T19:45:49.2663093Z gcc conda-forge/linux-64::gcc-11.4.0-h602e360_13 2025-05-07T19:45:49.2663681Z gds-tools conda-forge/linux-64::gds-tools-1.13.0.11-h5888daf_0 2025-05-07T19:45:49.2664157Z gmp conda-forge/linux-64::gmp-6.3.0-hac33072_2 2025-05-07T19:45:49.2664580Z gxx conda-forge/linux-64::gxx-11.4.0-h602e360_13 2025-05-07T19:45:49.2665004Z libcap conda-forge/linux-64::libcap-2.75-h39aace5_0 2025-05-07T19:45:49.2665502Z libcublas conda-forge/linux-64::libcublas-12.8.3.14-h9ab20c4_0 2025-05-07T19:45:49.2666055Z libcublas-dev conda-forge/linux-64::libcublas-dev-12.8.3.14-h9ab20c4_0 2025-05-07T19:45:49.2666616Z libcufft conda-forge/linux-64::libcufft-11.3.3.41-hbd13f7d_0 2025-05-07T19:45:49.2667162Z libcufft-dev conda-forge/linux-64::libcufft-dev-11.3.3.41-h5888daf_0 2025-05-07T19:45:49.2667702Z libcufile conda-forge/linux-64::libcufile-1.13.0.11-h12f29b5_0 2025-05-07T19:45:49.2668512Z libcufile-dev conda-forge/linux-64::libcufile-dev-1.13.0.11-h5888daf_0 2025-05-07T19:45:49.2669070Z libcurand conda-forge/linux-64::libcurand-10.3.9.55-hbd13f7d_0 2025-05-07T19:45:49.2669632Z libcurand-dev conda-forge/linux-64::libcurand-dev-10.3.9.55-h5888daf_0 2025-05-07T19:45:49.2670209Z libcusolver conda-forge/linux-64::libcusolver-11.7.2.55-h9ab20c4_0 2025-05-07T19:45:49.2670798Z libcusolver-dev conda-forge/linux-64::libcusolver-dev-11.7.2.55-h9ab20c4_0 2025-05-07T19:45:49.2671402Z libcusparse conda-forge/linux-64::libcusparse-12.5.7.53-hbd13f7d_0 2025-05-07T19:45:49.2671982Z libcusparse-dev conda-forge/linux-64::libcusparse-dev-12.5.7.53-h5888daf_0 2025-05-07T19:45:49.2672579Z libgcrypt-lib conda-forge/linux-64::libgcrypt-lib-1.11.0-hb9d3cd8_2 2025-05-07T19:45:49.2673114Z libglvnd conda-forge/linux-64::libglvnd-1.7.0-ha4b6fd6_2 2025-05-07T19:45:49.2673614Z libgpg-error conda-forge/linux-64::libgpg-error-1.55-h3f2d84a_0 2025-05-07T19:45:49.2674114Z libnl conda-forge/linux-64::libnl-3.11.0-hb9d3cd8_0 2025-05-07T19:45:49.2674569Z libnpp conda-forge/linux-64::libnpp-12.3.3.65-hbd13f7d_0 2025-05-07T19:45:49.2675306Z libnpp-dev conda-forge/linux-64::libnpp-dev-12.3.3.65-h5888daf_0 2025-05-07T19:45:49.2675930Z libnuma conda-forge/linux-64::libnuma-2.0.18-h4ab18f5_2 2025-05-07T19:45:49.2676431Z libnvfatbin conda-forge/linux-64::libnvfatbin-12.8.55-hbd13f7d_0 2025-05-07T19:45:49.2677014Z libnvfatbin-dev conda-forge/linux-64::libnvfatbin-dev-12.8.55-h5888daf_0 2025-05-07T19:45:49.2677594Z libnvjitlink conda-forge/linux-64::libnvjitlink-12.8.61-hbd13f7d_0 2025-05-07T19:45:49.2678196Z libnvjitlink-dev conda-forge/linux-64::libnvjitlink-dev-12.8.61-h5888daf_0 2025-05-07T19:45:49.2678780Z libnvjpeg conda-forge/linux-64::libnvjpeg-12.3.5.57-h97fd463_0 2025-05-07T19:45:49.2679328Z libnvjpeg-dev conda-forge/linux-64::libnvjpeg-dev-12.3.5.57-ha770c72_0 2025-05-07T19:45:49.2679884Z libopengl conda-forge/linux-64::libopengl-1.7.0-ha4b6fd6_2 2025-05-07T19:45:49.2680393Z libsystemd0 conda-forge/linux-64::libsystemd0-257.4-h4e0b6ca_1 2025-05-07T19:45:49.2680900Z libudev1 conda-forge/linux-64::libudev1-257.4-hbe16f8c_1 2025-05-07T19:45:49.2681421Z libxkbcommon conda-forge/linux-64::libxkbcommon-1.9.2-h65c71a3_0 2025-05-07T19:45:49.2681944Z libxkbfile conda-forge/linux-64::libxkbfile-1.1.0-h166bdaf_1 2025-05-07T19:45:49.2682441Z libxml2 conda-forge/linux-64::libxml2-2.13.8-h4bc477f_0 2025-05-07T19:45:49.2682883Z lz4-c conda-forge/linux-64::lz4-c-1.10.0-h5888daf_1 2025-05-07T19:45:49.2683416Z nsight-compute conda-forge/linux-64::nsight-compute-2025.1.0.14-hb5ebaad_0 2025-05-07T19:45:49.2683952Z nspr conda-forge/linux-64::nspr-4.36-h5888daf_0 2025-05-07T19:45:49.2684353Z nss conda-forge/linux-64::nss-3.111-h159eef7_0 2025-05-07T19:45:49.2684943Z ocl-icd conda-forge/linux-64::ocl-icd-2.3.3-hb9d3cd8_0 2025-05-07T19:45:49.2685768Z opencl-headers conda-forge/linux-64::opencl-headers-2024.10.24-h5888daf_0 2025-05-07T19:45:49.2686330Z rdma-core conda-forge/linux-64::rdma-core-57.0-h5888daf_0 2025-05-07T19:45:49.2686815Z wayland conda-forge/linux-64::wayland-1.23.1-h3e06ad9_0 2025-05-07T19:45:49.2687282Z xcb-util conda-forge/linux-64::xcb-util-0.4.1-hb711507_2 2025-05-07T19:45:49.2687826Z xcb-util-cursor conda-forge/linux-64::xcb-util-cursor-0.1.5-hb9d3cd8_0 2025-05-07T19:45:49.2688396Z xcb-util-image conda-forge/linux-64::xcb-util-image-0.4.0-hb711507_2 2025-05-07T19:45:49.2688985Z xcb-util-keysyms conda-forge/linux-64::xcb-util-keysyms-0.4.1-hb711507_0 2025-05-07T19:45:49.2689621Z xcb-util-renderut~ conda-forge/linux-64::xcb-util-renderutil-0.3.10-hb711507_0 2025-05-07T19:45:49.2690192Z xcb-util-wm conda-forge/linux-64::xcb-util-wm-0.4.2-hb711507_0 2025-05-07T19:45:49.2690758Z xkeyboard-config conda-forge/linux-64::xkeyboard-config-2.44-hb9d3cd8_0 2025-05-07T19:45:49.2691386Z xorg-libxcomposite conda-forge/linux-64::xorg-libxcomposite-0.4.6-hb9d3cd8_2 2025-05-07T19:45:49.2692021Z xorg-libxdamage conda-forge/linux-64::xorg-libxdamage-1.1.6-hb9d3cd8_0 2025-05-07T19:45:49.2692372Z 2025-05-07T19:45:49.2692393Z 2025-05-07T19:45:49.2692398Z 2025-05-07T19:45:49.2692564Z Downloading and Extracting Packages: ...working... 2025-05-07T19:45:49.2692961Z libcublas-12.8.3.14 | 460.2 MB | | 0% 2025-05-07T19:45:49.2693228Z 2025-05-07T19:45:49.2693651Z nsight-compute-2025. | 320.6 MB | | 0%  2025-05-07T19:45:49.2693917Z 2025-05-07T19:45:49.2693921Z 2025-05-07T19:45:49.2704888Z libcusparse-12.5.7.5 | 164.9 MB | | 0%  2025-05-07T19:45:49.2705171Z 2025-05-07T19:45:49.2705174Z 2025-05-07T19:45:49.2705178Z 2025-05-07T19:45:49.2722611Z libcusolver-11.7.2.5 | 156.9 MB | | 0%  2025-05-07T19:45:49.2722950Z 2025-05-07T19:45:49.2722954Z 2025-05-07T19:45:49.2722958Z 2025-05-07T19:45:49.2722961Z 2025-05-07T19:45:49.2731937Z libcufft-11.3.3.41 | 147.4 MB | | 0%  2025-05-07T19:45:49.2732223Z 2025-05-07T19:45:49.2732227Z 2025-05-07T19:45:49.2732230Z 2025-05-07T19:45:49.2732234Z 2025-05-07T19:45:49.2734822Z 2025-05-07T19:45:49.2735340Z libnpp-12.3.3.65 | 130.6 MB | | 0%  2025-05-07T19:45:49.2735660Z 2025-05-07T19:45:49.2735679Z 2025-05-07T19:45:49.2735684Z 2025-05-07T19:45:49.2735687Z 2025-05-07T19:45:49.2735691Z 2025-05-07T19:45:49.2735709Z 2025-05-07T19:45:49.2736057Z cuda-nsight-12.8.55 | 113.2 MB | | 0%  2025-05-07T19:45:49.2736530Z 2025-05-07T19:45:49.2736534Z 2025-05-07T19:45:49.2736537Z 2025-05-07T19:45:49.2736540Z 2025-05-07T19:45:49.2736544Z 2025-05-07T19:45:49.2736547Z 2025-05-07T19:45:49.2736550Z 2025-05-07T19:45:49.2736828Z cuda-nvvp-12.8.57 | 112.4 MB | | 0%  2025-05-07T19:45:49.2737169Z 2025-05-07T19:45:49.2737177Z 2025-05-07T19:45:49.2737182Z 2025-05-07T19:45:49.2737187Z 2025-05-07T19:45:49.2737201Z 2025-05-07T19:45:49.2737206Z 2025-05-07T19:45:49.2737214Z 2025-05-07T19:45:49.2737220Z 2025-05-07T19:45:49.2737609Z cuda-nvrtc-12.8.61 | 63.1 MB | | 0%  2025-05-07T19:45:49.2737904Z 2025-05-07T19:45:49.2737908Z 2025-05-07T19:45:49.2737911Z 2025-05-07T19:45:49.2737915Z 2025-05-07T19:45:49.2737918Z 2025-05-07T19:45:49.2737921Z 2025-05-07T19:45:49.2737924Z 2025-05-07T19:45:49.2737927Z 2025-05-07T19:45:49.2737931Z 2025-05-07T19:45:49.2738319Z libcurand-10.3.9.55 | 43.6 MB | | 0%  2025-05-07T19:45:49.2738713Z 2025-05-07T19:45:49.2738717Z 2025-05-07T19:45:49.2738721Z 2025-05-07T19:45:49.2738724Z 2025-05-07T19:45:49.2738727Z 2025-05-07T19:45:49.2738731Z 2025-05-07T19:45:49.2738734Z 2025-05-07T19:45:49.2738737Z 2025-05-07T19:45:49.2738741Z 2025-05-07T19:45:49.2738906Z 2025-05-07T19:45:49.2739248Z gds-tools-1.13.0.11 | 37.9 MB | | 0%  2025-05-07T19:45:49.2741857Z 2025-05-07T19:45:49.2741868Z 2025-05-07T19:45:49.2741874Z 2025-05-07T19:45:49.2741882Z 2025-05-07T19:45:49.2741887Z 2025-05-07T19:45:49.2741893Z 2025-05-07T19:45:49.2741896Z 2025-05-07T19:45:49.2741899Z 2025-05-07T19:45:49.2741902Z 2025-05-07T19:45:49.2741906Z 2025-05-07T19:45:49.2741910Z 2025-05-07T19:45:49.2742349Z libnvjitlink-12.8.61 | 28.7 MB | | 0%  2025-05-07T19:45:49.2742692Z 2025-05-07T19:45:49.2742697Z 2025-05-07T19:45:49.2742702Z 2025-05-07T19:45:49.2742707Z 2025-05-07T19:45:49.2742712Z 2025-05-07T19:45:49.2742717Z 2025-05-07T19:45:49.2742741Z 2025-05-07T19:45:49.2742747Z 2025-05-07T19:45:49.2742752Z 2025-05-07T19:45:49.2742759Z 2025-05-07T19:45:49.2742765Z 2025-05-07T19:45:49.2742771Z 2025-05-07T19:45:49.2743214Z cuda-nvcc-tools-12.8 | 24.5 MB | | 0%  2025-05-07T19:45:49.2743543Z 2025-05-07T19:45:49.2743546Z 2025-05-07T19:45:49.2743550Z 2025-05-07T19:45:49.2743557Z 2025-05-07T19:45:49.2743576Z 2025-05-07T19:45:49.2743579Z 2025-05-07T19:45:49.2743582Z 2025-05-07T19:45:49.2743585Z 2025-05-07T19:45:49.2743589Z 2025-05-07T19:45:49.2743592Z 2025-05-07T19:45:49.2743595Z 2025-05-07T19:45:49.2743598Z 2025-05-07T19:45:49.2743602Z 2025-05-07T19:45:49.2744021Z cuda-nvvm-tools-12.8 | 23.5 MB | | 0%  2025-05-07T19:45:49.2744400Z 2025-05-07T19:45:49.2744417Z 2025-05-07T19:45:49.2744421Z 2025-05-07T19:45:49.2744424Z 2025-05-07T19:45:49.2744428Z 2025-05-07T19:45:49.2744431Z 2025-05-07T19:45:49.2744435Z 2025-05-07T19:45:49.2744438Z 2025-05-07T19:45:49.2744441Z 2025-05-07T19:45:49.2744445Z 2025-05-07T19:45:49.2744448Z 2025-05-07T19:45:49.2744452Z 2025-05-07T19:45:49.2744455Z 2025-05-07T19:45:49.2744458Z 2025-05-07T19:45:49.2744798Z cuda-nvvm-impl-12.8. | 20.8 MB | | 0%  2025-05-07T19:45:49.2745302Z 2025-05-07T19:45:49.2745306Z 2025-05-07T19:45:49.2745314Z 2025-05-07T19:45:49.2745318Z 2025-05-07T19:45:49.2745321Z 2025-05-07T19:45:49.2745324Z 2025-05-07T19:45:49.2745328Z 2025-05-07T19:45:49.2745331Z 2025-05-07T19:45:49.2745334Z 2025-05-07T19:45:49.2745338Z 2025-05-07T19:45:49.2745341Z 2025-05-07T19:45:49.2745344Z 2025-05-07T19:45:49.2745348Z 2025-05-07T19:45:49.2745351Z 2025-05-07T19:45:49.2745354Z 2025-05-07T19:45:49.2745681Z cuda-nvcc-dev_linux- | 12.7 MB | | 0%  2025-05-07T19:45:49.2746116Z 2025-05-07T19:45:49.2746120Z 2025-05-07T19:45:49.2746123Z 2025-05-07T19:45:49.2746126Z 2025-05-07T19:45:49.2746130Z 2025-05-07T19:45:49.2746133Z 2025-05-07T19:45:49.2746137Z 2025-05-07T19:45:49.2746140Z 2025-05-07T19:45:49.2746143Z 2025-05-07T19:45:49.2746146Z 2025-05-07T19:45:49.2746150Z 2025-05-07T19:45:49.2746175Z 2025-05-07T19:45:49.2746187Z 2025-05-07T19:45:49.2746192Z 2025-05-07T19:45:49.2746199Z 2025-05-07T19:45:49.2746204Z 2025-05-07T19:45:49.2746611Z cuda-sanitizer-api-1 | 8.8 MB | | 0%  2025-05-07T19:45:49.2746971Z 2025-05-07T19:45:49.2746993Z 2025-05-07T19:45:49.2746998Z 2025-05-07T19:45:49.2747002Z 2025-05-07T19:45:49.2747008Z 2025-05-07T19:45:49.2747014Z 2025-05-07T19:45:49.2747019Z 2025-05-07T19:45:49.2747023Z 2025-05-07T19:45:49.2747030Z 2025-05-07T19:45:49.2747036Z 2025-05-07T19:45:49.2747041Z 2025-05-07T19:45:49.2747048Z 2025-05-07T19:45:49.2747053Z 2025-05-07T19:45:49.2747059Z 2025-05-07T19:45:49.2747064Z 2025-05-07T19:45:49.2747070Z 2025-05-07T19:45:49.2747076Z 2025-05-07T19:45:49.2747470Z cuda-nvdisasm-12.8.5 | 4.9 MB | | 0%  2025-05-07T19:45:49.2747823Z 2025-05-07T19:45:49.2747827Z 2025-05-07T19:45:49.2747830Z 2025-05-07T19:45:49.2747833Z 2025-05-07T19:45:49.2747836Z 2025-05-07T19:45:49.2747944Z 2025-05-07T19:45:49.2747947Z 2025-05-07T19:45:49.2747951Z 2025-05-07T19:45:49.2747954Z 2025-05-07T19:45:49.2748013Z 2025-05-07T19:45:49.2748018Z 2025-05-07T19:45:49.2748021Z 2025-05-07T19:45:49.2748024Z 2025-05-07T19:45:49.2748028Z 2025-05-07T19:45:49.2748031Z 2025-05-07T19:45:49.2748034Z 2025-05-07T19:45:49.2748038Z 2025-05-07T19:45:49.2748065Z 2025-05-07T19:45:49.2748390Z cuda-cupti-dev-12.8. | 4.0 MB | | 0%  2025-05-07T19:45:49.2748730Z 2025-05-07T19:45:49.2748734Z 2025-05-07T19:45:49.2748737Z 2025-05-07T19:45:49.2748741Z 2025-05-07T19:45:49.2748744Z 2025-05-07T19:45:49.2748748Z 2025-05-07T19:45:49.2748767Z 2025-05-07T19:45:49.2748771Z 2025-05-07T19:45:49.2748774Z 2025-05-07T19:45:49.2748777Z 2025-05-07T19:45:49.2748781Z 2025-05-07T19:45:49.2748784Z 2025-05-07T19:45:49.2748788Z 2025-05-07T19:45:49.2748791Z 2025-05-07T19:45:49.2748794Z 2025-05-07T19:45:49.2748797Z 2025-05-07T19:45:49.2748805Z 2025-05-07T19:45:49.2748808Z 2025-05-07T19:45:49.2748811Z 2025-05-07T19:45:49.3674308Z ... (more hidden) ... 2025-05-07T19:45:49.3681970Z libcublas-12.8.3.14 | 460.2 MB | | 1% 2025-05-07T19:45:49.3682341Z 2025-05-07T19:45:49.3698120Z nsight-compute-2025. | 320.6 MB | | 0%  2025-05-07T19:45:49.3698435Z 2025-05-07T19:45:49.3698440Z 2025-05-07T19:45:49.3790469Z libcusparse-12.5.7.5 | 164.9 MB | | 0%  2025-05-07T19:45:49.3790774Z 2025-05-07T19:45:49.3790779Z 2025-05-07T19:45:49.3790783Z 2025-05-07T19:45:49.3790792Z 2025-05-07T19:45:49.4210053Z libcufft-11.3.3.41 | 147.4 MB | | 0%  2025-05-07T19:45:49.4210401Z 2025-05-07T19:45:49.4210406Z 2025-05-07T19:45:49.4210410Z 2025-05-07T19:45:49.4677642Z libcusolver-11.7.2.5 | 156.9 MB | | 0%  2025-05-07T19:45:49.4686038Z libcublas-12.8.3.14 | 460.2 MB | 3 | 3% 2025-05-07T19:45:49.4688818Z 2025-05-07T19:45:49.4696612Z nsight-compute-2025. | 320.6 MB | 2 | 2%  2025-05-07T19:45:49.4696922Z 2025-05-07T19:45:49.4696938Z 2025-05-07T19:45:49.4794113Z libcusparse-12.5.7.5 | 164.9 MB | 2 | 3%  2025-05-07T19:45:49.4794426Z 2025-05-07T19:45:49.4794430Z 2025-05-07T19:45:49.4794434Z 2025-05-07T19:45:49.4794437Z 2025-05-07T19:45:49.5211303Z libcufft-11.3.3.41 | 147.4 MB | 4 | 5%  2025-05-07T19:45:49.5211629Z 2025-05-07T19:45:49.5211634Z 2025-05-07T19:45:49.5211759Z 2025-05-07T19:45:49.5689729Z libcusolver-11.7.2.5 | 156.9 MB | 2 | 2%  2025-05-07T19:45:49.5690060Z 2025-05-07T19:45:49.5700004Z nsight-compute-2025. | 320.6 MB | 3 | 4%  2025-05-07T19:45:49.5701166Z 2025-05-07T19:45:49.5701179Z 2025-05-07T19:45:49.5795056Z libcusparse-12.5.7.5 | 164.9 MB | 5 | 5%  2025-05-07T19:45:49.5795415Z 2025-05-07T19:45:49.5795423Z 2025-05-07T19:45:49.5795428Z 2025-05-07T19:45:49.5795462Z 2025-05-07T19:45:49.5997345Z libcufft-11.3.3.41 | 147.4 MB | 8 | 8%  2025-05-07T19:45:49.6216421Z libcublas-12.8.3.14 | 460.2 MB | 4 | 5% 2025-05-07T19:45:49.6216860Z 2025-05-07T19:45:49.6216901Z 2025-05-07T19:45:49.6216906Z 2025-05-07T19:45:49.6694842Z libcusolver-11.7.2.5 | 156.9 MB | 3 | 4%  2025-05-07T19:45:49.6695207Z 2025-05-07T19:45:49.6700096Z nsight-compute-2025. | 320.6 MB | 5 | 6%  2025-05-07T19:45:49.6700647Z 2025-05-07T19:45:49.6700653Z 2025-05-07T19:45:49.6796385Z libcusparse-12.5.7.5 | 164.9 MB | 8 | 8%  2025-05-07T19:45:49.6796733Z 2025-05-07T19:45:49.6796845Z 2025-05-07T19:45:49.6796852Z 2025-05-07T19:45:49.6796858Z 2025-05-07T19:45:49.7219320Z libcufft-11.3.3.41 | 147.4 MB | #2 | 12%  2025-05-07T19:45:49.7220475Z 2025-05-07T19:45:49.7220489Z 2025-05-07T19:45:49.7220499Z 2025-05-07T19:45:49.7231693Z libcusolver-11.7.2.5 | 156.9 MB | 6 | 6%  2025-05-07T19:45:49.7693707Z libcublas-12.8.3.14 | 460.2 MB | 6 | 6% 2025-05-07T19:45:49.7694031Z 2025-05-07T19:45:49.7701785Z nsight-compute-2025. | 320.6 MB | 7 | 8%  2025-05-07T19:45:49.7702101Z 2025-05-07T19:45:49.7702118Z 2025-05-07T19:45:49.7821895Z libcusparse-12.5.7.5 | 164.9 MB | #1 | 11%  2025-05-07T19:45:49.7822210Z 2025-05-07T19:45:49.7822230Z 2025-05-07T19:45:49.7822235Z 2025-05-07T19:45:49.7822239Z 2025-05-07T19:45:49.8218382Z libcufft-11.3.3.41 | 147.4 MB | #5 | 16%  2025-05-07T19:45:49.8218698Z 2025-05-07T19:45:49.8218703Z 2025-05-07T19:45:49.8218707Z 2025-05-07T19:45:49.8231029Z libcusolver-11.7.2.5 | 156.9 MB | 8 | 9%  2025-05-07T19:45:49.8701671Z libcublas-12.8.3.14 | 460.2 MB | 8 | 9% 2025-05-07T19:45:49.8701971Z 2025-05-07T19:45:49.8919555Z nsight-compute-2025. | 320.6 MB | 9 | 10%  2025-05-07T19:45:49.8919856Z 2025-05-07T19:45:49.8919862Z 2025-05-07T19:45:49.9235485Z libcusparse-12.5.7.5 | 164.9 MB | #3 | 14%  2025-05-07T19:45:49.9458415Z libcublas-12.8.3.14 | 460.2 MB | #2 | 13% 2025-05-07T19:45:49.9458897Z 2025-05-07T19:45:49.9458940Z 2025-05-07T19:45:49.9458946Z 2025-05-07T19:45:49.9745628Z libcusolver-11.7.2.5 | 156.9 MB | # | 11%  2025-05-07T19:45:49.9745948Z 2025-05-07T19:45:49.9922001Z nsight-compute-2025. | 320.6 MB | #1 | 12%  2025-05-07T19:45:49.9922436Z 2025-05-07T19:45:49.9922468Z 2025-05-07T19:45:50.0014120Z libcusparse-12.5.7.5 | 164.9 MB | #6 | 17%  2025-05-07T19:45:50.0014431Z 2025-05-07T19:45:50.0014436Z 2025-05-07T19:45:50.0014463Z 2025-05-07T19:45:50.0014467Z 2025-05-07T19:45:50.0232060Z libcufft-11.3.3.41 | 147.4 MB | #9 | 19%  2025-05-07T19:45:50.0458326Z libcublas-12.8.3.14 | 460.2 MB | #5 | 15% 2025-05-07T19:45:50.0458654Z 2025-05-07T19:45:50.0458786Z 2025-05-07T19:45:50.0458793Z 2025-05-07T19:45:50.0775929Z libcusolver-11.7.2.5 | 156.9 MB | #3 | 14%  2025-05-07T19:45:50.0776288Z 2025-05-07T19:45:50.1236129Z nsight-compute-2025. | 320.6 MB | #3 | 14%  2025-05-07T19:45:50.1277815Z libcublas-12.8.3.14 | 460.2 MB | #8 | 18% 2025-05-07T19:45:50.1278170Z 2025-05-07T19:45:50.1278175Z 2025-05-07T19:45:50.1458866Z libcusparse-12.5.7.5 | 164.9 MB | #9 | 20%  2025-05-07T19:45:50.1459198Z 2025-05-07T19:45:50.1459203Z 2025-05-07T19:45:50.1459208Z 2025-05-07T19:45:50.1777395Z libcusolver-11.7.2.5 | 156.9 MB | #9 | 19%  2025-05-07T19:45:50.1777722Z 2025-05-07T19:45:50.1865345Z nsight-compute-2025. | 320.6 MB | #6 | 16%  2025-05-07T19:45:50.1865646Z 2025-05-07T19:45:50.1865651Z 2025-05-07T19:45:50.1865654Z 2025-05-07T19:45:50.1865663Z 2025-05-07T19:45:50.2264069Z libcufft-11.3.3.41 | 147.4 MB | ##1 | 22%  2025-05-07T19:45:50.2264394Z 2025-05-07T19:45:50.2264399Z 2025-05-07T19:45:50.2459334Z libcusparse-12.5.7.5 | 164.9 MB | ##3 | 23%  2025-05-07T19:45:50.2459791Z 2025-05-07T19:45:50.2459796Z 2025-05-07T19:45:50.2459800Z 2025-05-07T19:45:50.2866079Z libcusolver-11.7.2.5 | 156.9 MB | ##2 | 23%  2025-05-07T19:45:50.2866559Z 2025-05-07T19:45:50.2867443Z nsight-compute-2025. | 320.6 MB | #8 | 18%  2025-05-07T19:45:50.2867736Z 2025-05-07T19:45:50.2867745Z 2025-05-07T19:45:50.2867749Z 2025-05-07T19:45:50.2868697Z 2025-05-07T19:45:50.3183643Z libcufft-11.3.3.41 | 147.4 MB | ##5 | 25%  2025-05-07T19:45:50.3263902Z libcublas-12.8.3.14 | 460.2 MB | ## | 21% 2025-05-07T19:45:50.3264226Z 2025-05-07T19:45:50.3264537Z 2025-05-07T19:45:50.3536944Z libcusparse-12.5.7.5 | 164.9 MB | ##7 | 27%  2025-05-07T19:45:50.3537260Z 2025-05-07T19:45:50.3537265Z 2025-05-07T19:45:50.3537283Z 2025-05-07T19:45:50.3904557Z libcusolver-11.7.2.5 | 156.9 MB | ##5 | 26%  2025-05-07T19:45:50.3904870Z 2025-05-07T19:45:50.3905107Z 2025-05-07T19:45:50.3905114Z 2025-05-07T19:45:50.3905120Z 2025-05-07T19:45:50.4265591Z libcufft-11.3.3.41 | 147.4 MB | ##9 | 29%  2025-05-07T19:45:50.4265928Z 2025-05-07T19:45:50.4265933Z 2025-05-07T19:45:50.4326838Z libcusparse-12.5.7.5 | 164.9 MB | ###2 | 33%  2025-05-07T19:45:50.4419348Z libcublas-12.8.3.14 | 460.2 MB | ##2 | 23% 2025-05-07T19:45:50.4419912Z 2025-05-07T19:45:50.4909951Z nsight-compute-2025. | 320.6 MB | ## | 20%  2025-05-07T19:45:50.4910255Z 2025-05-07T19:45:50.4910260Z 2025-05-07T19:45:50.4910264Z 2025-05-07T19:45:50.4910269Z 2025-05-07T19:45:50.5275096Z libcufft-11.3.3.41 | 147.4 MB | ###2 | 33%  2025-05-07T19:45:50.5275410Z 2025-05-07T19:45:50.5275414Z 2025-05-07T19:45:50.5275418Z 2025-05-07T19:45:50.5327962Z libcusolver-11.7.2.5 | 156.9 MB | ##9 | 29%  2025-05-07T19:45:50.5381880Z libcublas-12.8.3.14 | 460.2 MB | ##5 | 26% 2025-05-07T19:45:50.5382214Z 2025-05-07T19:45:50.5382246Z 2025-05-07T19:45:50.5777266Z libcusparse-12.5.7.5 | 164.9 MB | ###7 | 38%  2025-05-07T19:45:50.5777612Z 2025-05-07T19:45:50.5910962Z nsight-compute-2025. | 320.6 MB | ##2 | 22%  2025-05-07T19:45:50.5911264Z 2025-05-07T19:45:50.5911269Z 2025-05-07T19:45:50.5911273Z 2025-05-07T19:45:50.5911277Z 2025-05-07T19:45:50.6277928Z libcufft-11.3.3.41 | 147.4 MB | ###7 | 37%  2025-05-07T19:45:50.6278269Z 2025-05-07T19:45:50.6278275Z 2025-05-07T19:45:50.6278279Z 2025-05-07T19:45:50.6456599Z libcusolver-11.7.2.5 | 156.9 MB | ###3 | 34%  2025-05-07T19:45:50.6821886Z libcublas-12.8.3.14 | 460.2 MB | ##8 | 28% 2025-05-07T19:45:50.6822184Z 2025-05-07T19:45:50.7467095Z nsight-compute-2025. | 320.6 MB | ##3 | 24%  2025-05-07T19:45:50.7623617Z libcublas-12.8.3.14 | 460.2 MB | ###1 | 32% 2025-05-07T19:45:50.7623915Z 2025-05-07T19:45:50.7623920Z 2025-05-07T19:45:50.8140193Z libcusparse-12.5.7.5 | 164.9 MB | ####1 | 42%  2025-05-07T19:45:50.8140833Z 2025-05-07T19:45:50.8292217Z nsight-compute-2025. | 320.6 MB | ##5 | 25%  2025-05-07T19:45:50.8292527Z 2025-05-07T19:45:50.8292532Z 2025-05-07T19:45:50.8292536Z 2025-05-07T19:45:50.8697816Z libcusolver-11.7.2.5 | 156.9 MB | ###7 | 37%  2025-05-07T19:45:50.8698154Z 2025-05-07T19:45:50.8698162Z 2025-05-07T19:45:50.8698167Z 2025-05-07T19:45:50.8698172Z 2025-05-07T19:45:50.8902087Z libcufft-11.3.3.41 | 147.4 MB | ####1 | 41%  2025-05-07T19:45:50.8990919Z libcublas-12.8.3.14 | 460.2 MB | ###4 | 34% 2025-05-07T19:45:50.8991238Z 2025-05-07T19:45:50.8991483Z 2025-05-07T19:45:50.9277431Z libcusparse-12.5.7.5 | 164.9 MB | ####5 | 45%  2025-05-07T19:45:50.9277755Z 2025-05-07T19:45:50.9698482Z nsight-compute-2025. | 320.6 MB | ##7 | 27%  2025-05-07T19:45:50.9698827Z 2025-05-07T19:45:50.9698832Z 2025-05-07T19:45:50.9698837Z 2025-05-07T19:45:50.9698841Z 2025-05-07T19:45:50.9791312Z libcufft-11.3.3.41 | 147.4 MB | ##### | 51%  2025-05-07T19:45:50.9791641Z 2025-05-07T19:45:50.9791670Z 2025-05-07T19:45:50.9791674Z 2025-05-07T19:45:50.9902028Z libcusolver-11.7.2.5 | 156.9 MB | ###9 | 40%  2025-05-07T19:45:51.0172623Z libcublas-12.8.3.14 | 460.2 MB | ###7 | 37% 2025-05-07T19:45:51.0173079Z 2025-05-07T19:45:51.0173093Z 2025-05-07T19:45:51.0305658Z libcusparse-12.5.7.5 | 164.9 MB | ####8 | 48%  2025-05-07T19:45:51.0305981Z 2025-05-07T19:45:51.0732019Z nsight-compute-2025. | 320.6 MB | ##8 | 29%  2025-05-07T19:45:51.0732326Z 2025-05-07T19:45:51.0732331Z 2025-05-07T19:45:51.0732335Z 2025-05-07T19:45:51.0902876Z 2025-05-07T19:45:51.0903348Z libcufft-11.3.3.41 | 147.4 MB | #####6 | 57%  2025-05-07T19:45:51.1215894Z libcublas-12.8.3.14 | 460.2 MB | ###9 | 40% 2025-05-07T19:45:51.1216341Z 2025-05-07T19:45:51.1216355Z 2025-05-07T19:45:51.1306512Z libcusparse-12.5.7.5 | 164.9 MB | #####2 | 53%  2025-05-07T19:45:51.1307075Z 2025-05-07T19:45:51.1348260Z nsight-compute-2025. | 320.6 MB | ### | 31%  2025-05-07T19:45:51.1348575Z 2025-05-07T19:45:51.1348580Z 2025-05-07T19:45:51.1348597Z 2025-05-07T19:45:51.2217033Z libcusolver-11.7.2.5 | 156.9 MB | ####2 | 42%  2025-05-07T19:45:51.2217351Z 2025-05-07T19:45:51.2217356Z 2025-05-07T19:45:51.2303167Z libcusparse-12.5.7.5 | 164.9 MB | #####7 | 57%  2025-05-07T19:45:51.2349978Z libcublas-12.8.3.14 | 460.2 MB | ####2 | 43% 2025-05-07T19:45:51.2350290Z 2025-05-07T19:45:51.2350295Z 2025-05-07T19:45:51.2350322Z 2025-05-07T19:45:51.2402511Z libcusolver-11.7.2.5 | 156.9 MB | ####6 | 46%  2025-05-07T19:45:51.2402832Z 2025-05-07T19:45:51.2510312Z nsight-compute-2025. | 320.6 MB | ###2 | 32%  2025-05-07T19:45:51.2510778Z 2025-05-07T19:45:51.2510783Z 2025-05-07T19:45:51.2510787Z 2025-05-07T19:45:51.2510796Z 2025-05-07T19:45:51.3351713Z libcufft-11.3.3.41 | 147.4 MB | ######2 | 62%  2025-05-07T19:45:51.3352113Z 2025-05-07T19:45:51.3352118Z 2025-05-07T19:45:51.3352138Z 2025-05-07T19:45:51.3367427Z libcusolver-11.7.2.5 | 156.9 MB | ####9 | 50%  2025-05-07T19:45:51.3367757Z 2025-05-07T19:45:51.3367788Z 2025-05-07T19:45:51.3586440Z libcusparse-12.5.7.5 | 164.9 MB | ###### | 61%  2025-05-07T19:45:51.3586751Z 2025-05-07T19:45:51.3586756Z 2025-05-07T19:45:51.3586760Z 2025-05-07T19:45:51.3586769Z 2025-05-07T19:45:51.3820410Z libcufft-11.3.3.41 | 147.4 MB | ######6 | 67%  2025-05-07T19:45:51.4055059Z libcublas-12.8.3.14 | 460.2 MB | ####4 | 45% 2025-05-07T19:45:51.4056525Z 2025-05-07T19:45:51.4356221Z nsight-compute-2025. | 320.6 MB | ###4 | 34%  2025-05-07T19:45:51.4356530Z 2025-05-07T19:45:51.4356535Z 2025-05-07T19:45:51.4369827Z 2025-05-07T19:45:51.4370896Z libcusolver-11.7.2.5 | 156.9 MB | #####3 | 54%  2025-05-07T19:45:51.4371802Z 2025-05-07T19:45:51.4371855Z 2025-05-07T19:45:51.4815110Z libcusparse-12.5.7.5 | 164.9 MB | ######4 | 65%  2025-05-07T19:45:51.4815476Z 2025-05-07T19:45:51.4815481Z 2025-05-07T19:45:51.4815485Z 2025-05-07T19:45:51.4815488Z 2025-05-07T19:45:51.5096885Z libcufft-11.3.3.41 | 147.4 MB | #######1 | 71%  2025-05-07T19:45:51.5097210Z 2025-05-07T19:45:51.5271120Z nsight-compute-2025. | 320.6 MB | ###5 | 36%  2025-05-07T19:45:51.5359798Z libcublas-12.8.3.14 | 460.2 MB | ####7 | 47% 2025-05-07T19:45:51.5360662Z 2025-05-07T19:45:51.5360675Z 2025-05-07T19:45:51.5360685Z 2025-05-07T19:45:51.5669286Z libcusolver-11.7.2.5 | 156.9 MB | #####7 | 57%  2025-05-07T19:45:51.5669610Z 2025-05-07T19:45:51.5669615Z 2025-05-07T19:45:51.5818810Z libcusparse-12.5.7.5 | 164.9 MB | ######8 | 68%  2025-05-07T19:45:51.5819297Z 2025-05-07T19:45:51.5819307Z 2025-05-07T19:45:51.5819314Z 2025-05-07T19:45:51.5819322Z 2025-05-07T19:45:51.6102566Z libcufft-11.3.3.41 | 147.4 MB | #######5 | 75%  2025-05-07T19:45:51.6103402Z 2025-05-07T19:45:51.6462990Z nsight-compute-2025. | 320.6 MB | ###7 | 38%  2025-05-07T19:45:51.6463297Z 2025-05-07T19:45:51.6463302Z 2025-05-07T19:45:51.6463539Z 2025-05-07T19:45:51.6669465Z libcusolver-11.7.2.5 | 156.9 MB | ###### | 61%  2025-05-07T19:45:51.6669794Z 2025-05-07T19:45:51.6669798Z 2025-05-07T19:45:51.6765659Z libcusparse-12.5.7.5 | 164.9 MB | #######1 | 72%  2025-05-07T19:45:51.6827531Z libcublas-12.8.3.14 | 460.2 MB | ####9 | 49% 2025-05-07T19:45:51.6827846Z 2025-05-07T19:45:51.6827851Z 2025-05-07T19:45:51.6827878Z 2025-05-07T19:45:51.6827882Z 2025-05-07T19:45:51.7101596Z libcufft-11.3.3.41 | 147.4 MB | #######9 | 79%  2025-05-07T19:45:51.7101921Z 2025-05-07T19:45:51.7467098Z nsight-compute-2025. | 320.6 MB | ###9 | 39%  2025-05-07T19:45:51.7467404Z 2025-05-07T19:45:51.7467409Z 2025-05-07T19:45:51.7467448Z 2025-05-07T19:45:51.7670196Z libcusolver-11.7.2.5 | 156.9 MB | ######3 | 64%  2025-05-07T19:45:51.7670547Z 2025-05-07T19:45:51.7670552Z 2025-05-07T19:45:51.7854079Z libcusparse-12.5.7.5 | 164.9 MB | #######6 | 76%  2025-05-07T19:45:51.8470619Z libcublas-12.8.3.14 | 460.2 MB | ##### | 51% 2025-05-07T19:45:51.8470922Z 2025-05-07T19:45:51.8470953Z 2025-05-07T19:45:51.8470968Z 2025-05-07T19:45:51.8488646Z libcusolver-11.7.2.5 | 156.9 MB | ######7 | 68%  2025-05-07T19:45:51.8489001Z 2025-05-07T19:45:51.8489080Z 2025-05-07T19:45:51.8489085Z 2025-05-07T19:45:51.8489089Z 2025-05-07T19:45:51.8671554Z libcufft-11.3.3.41 | 147.4 MB | ########3 | 84%  2025-05-07T19:45:51.8671911Z 2025-05-07T19:45:51.8671917Z 2025-05-07T19:45:51.8687001Z libcusparse-12.5.7.5 | 164.9 MB | ######## | 80%  2025-05-07T19:45:51.8687349Z 2025-05-07T19:45:51.9102717Z nsight-compute-2025. | 320.6 MB | ####1 | 41%  2025-05-07T19:45:51.9489412Z libcublas-12.8.3.14 | 460.2 MB | #####2 | 53% 2025-05-07T19:45:51.9489742Z 2025-05-07T19:45:51.9489930Z 2025-05-07T19:45:51.9489939Z 2025-05-07T19:45:51.9489945Z 2025-05-07T19:45:51.9690014Z libcufft-11.3.3.41 | 147.4 MB | ########7 | 88%  2025-05-07T19:45:51.9690367Z 2025-05-07T19:45:51.9727564Z nsight-compute-2025. | 320.6 MB | ####2 | 43%  2025-05-07T19:45:51.9727900Z 2025-05-07T19:45:51.9727905Z 2025-05-07T19:45:51.9727909Z 2025-05-07T19:45:51.9755907Z libcusolver-11.7.2.5 | 156.9 MB | #######1 | 71%  2025-05-07T19:45:51.9756221Z 2025-05-07T19:45:52.0428688Z 2025-05-07T19:45:52.0429200Z libcusparse-12.5.7.5 | 164.9 MB | ########3 | 84%  2025-05-07T19:45:52.0489669Z libcublas-12.8.3.14 | 460.2 MB | #####4 | 54% 2025-05-07T19:45:52.0490027Z 2025-05-07T19:45:52.0490247Z 2025-05-07T19:45:52.0490255Z 2025-05-07T19:45:52.0490261Z 2025-05-07T19:45:52.0691089Z libcufft-11.3.3.41 | 147.4 MB | #########2 | 92%  2025-05-07T19:45:52.0691453Z 2025-05-07T19:45:52.0728881Z nsight-compute-2025. | 320.6 MB | ####4 | 45%  2025-05-07T19:45:52.0729184Z 2025-05-07T19:45:52.0729277Z 2025-05-07T19:45:52.0729306Z 2025-05-07T19:45:52.1061936Z libcusolver-11.7.2.5 | 156.9 MB | #######4 | 75%  2025-05-07T19:45:52.1062250Z 2025-05-07T19:45:52.1063039Z 2025-05-07T19:45:52.1494917Z libcusparse-12.5.7.5 | 164.9 MB | ########7 | 87%  2025-05-07T19:45:52.1495232Z 2025-05-07T19:45:52.1495237Z 2025-05-07T19:45:52.1495240Z 2025-05-07T19:45:52.1495244Z 2025-05-07T19:45:52.1541739Z libcufft-11.3.3.41 | 147.4 MB | #########6 | 96%  2025-05-07T19:45:52.1694927Z libcublas-12.8.3.14 | 460.2 MB | #####5 | 56% 2025-05-07T19:45:52.1695274Z 2025-05-07T19:45:52.1852009Z nsight-compute-2025. | 320.6 MB | ####6 | 47%  2025-05-07T19:45:52.1852312Z 2025-05-07T19:45:52.1852318Z 2025-05-07T19:45:52.1852323Z 2025-05-07T19:45:52.2063094Z libcusolver-11.7.2.5 | 156.9 MB | #######7 | 78%  2025-05-07T19:45:52.2063457Z 2025-05-07T19:45:52.2063462Z 2025-05-07T19:45:52.2585653Z libcusparse-12.5.7.5 | 164.9 MB | #########1 | 91%  2025-05-07T19:45:52.2694326Z libcublas-12.8.3.14 | 460.2 MB | #####7 | 57% 2025-05-07T19:45:52.2694817Z 2025-05-07T19:45:52.2853896Z nsight-compute-2025. | 320.6 MB | ####8 | 49%  2025-05-07T19:45:52.2854199Z 2025-05-07T19:45:52.2854204Z 2025-05-07T19:45:52.2854208Z 2025-05-07T19:45:52.3064599Z libcusolver-11.7.2.5 | 156.9 MB | ########1 | 81%  2025-05-07T19:45:52.3064922Z 2025-05-07T19:45:52.3064926Z 2025-05-07T19:45:52.3586258Z libcusparse-12.5.7.5 | 164.9 MB | #########4 | 95%  2025-05-07T19:45:52.3696333Z libcublas-12.8.3.14 | 460.2 MB | #####8 | 59% 2025-05-07T19:45:52.3696941Z 2025-05-07T19:45:52.3856068Z nsight-compute-2025. | 320.6 MB | #####1 | 51%  2025-05-07T19:45:52.3856373Z 2025-05-07T19:45:52.3856378Z 2025-05-07T19:45:52.3856382Z 2025-05-07T19:45:52.4087873Z libcusolver-11.7.2.5 | 156.9 MB | ########5 | 85%  2025-05-07T19:45:52.4088425Z 2025-05-07T19:45:52.4088430Z 2025-05-07T19:45:52.4588010Z libcusparse-12.5.7.5 | 164.9 MB | #########8 | 98%  2025-05-07T19:45:52.4855563Z libcublas-12.8.3.14 | 460.2 MB | ###### | 61% 2025-05-07T19:45:52.4856050Z 2025-05-07T19:45:52.4856064Z 2025-05-07T19:45:52.4856070Z 2025-05-07T19:45:52.4887578Z libcusolver-11.7.2.5 | 156.9 MB | ######### | 90%  2025-05-07T19:45:52.4887898Z 2025-05-07T19:45:52.5588737Z nsight-compute-2025. | 320.6 MB | #####3 | 53%  2025-05-07T19:45:52.5857683Z libcublas-12.8.3.14 | 460.2 MB | ######2 | 63% 2025-05-07T19:45:52.5858130Z 2025-05-07T19:45:52.5858143Z 2025-05-07T19:45:52.5858152Z 2025-05-07T19:45:52.6021375Z libcusolver-11.7.2.5 | 156.9 MB | #########6 | 96%  2025-05-07T19:45:52.6021687Z 2025-05-07T19:45:52.6589673Z nsight-compute-2025. | 320.6 MB | #####4 | 55%  2025-05-07T19:45:52.7037377Z libcublas-12.8.3.14 | 460.2 MB | ######4 | 65% 2025-05-07T19:45:52.7037849Z 2025-05-07T19:45:52.7589684Z nsight-compute-2025. | 320.6 MB | #####7 | 58%  2025-05-07T19:45:52.8092301Z libcublas-12.8.3.14 | 460.2 MB | ######8 | 68% 2025-05-07T19:45:52.8092726Z 2025-05-07T19:45:52.8590332Z nsight-compute-2025. | 320.6 MB | ###### | 60%  2025-05-07T19:45:52.9601545Z libcublas-12.8.3.14 | 460.2 MB | #######1 | 72% 2025-05-07T19:45:52.9616629Z libcublas-12.8.3.14 | 460.2 MB | #######5 | 75% 2025-05-07T19:45:52.9617081Z 2025-05-07T19:45:53.0688707Z nsight-compute-2025. | 320.6 MB | ######2 | 62%  2025-05-07T19:45:53.0689037Z 2025-05-07T19:45:53.0711093Z nsight-compute-2025. | 320.6 MB | ######5 | 66%  2025-05-07T19:45:53.1711900Z libcublas-12.8.3.14 | 460.2 MB | #######8 | 78% 2025-05-07T19:45:53.1733535Z libcublas-12.8.3.14 | 460.2 MB | ########1 | 82% 2025-05-07T19:45:53.1733979Z 2025-05-07T19:45:53.2605646Z nsight-compute-2025. | 320.6 MB | ######9 | 70%  2025-05-07T19:45:53.2605975Z 2025-05-07T19:45:53.2606010Z 2025-05-07T19:45:53.2606018Z 2025-05-07T19:45:53.2606023Z 2025-05-07T19:45:53.2715230Z libcufft-11.3.3.41 | 147.4 MB | ########## | 100%  2025-05-07T19:45:53.3116270Z libcublas-12.8.3.14 | 460.2 MB | ########5 | 86% 2025-05-07T19:45:53.3116722Z 2025-05-07T19:45:53.3116740Z 2025-05-07T19:45:53.3116745Z 2025-05-07T19:45:53.3116750Z 2025-05-07T19:45:53.3116754Z 2025-05-07T19:45:53.3380699Z libnpp-12.3.3.65 | 130.6 MB | | 0%  2025-05-07T19:45:53.3381029Z 2025-05-07T19:45:53.4130981Z nsight-compute-2025. | 320.6 MB | #######2 | 72%  2025-05-07T19:45:53.4140109Z libcublas-12.8.3.14 | 460.2 MB | ########9 | 89% 2025-05-07T19:45:53.4140400Z 2025-05-07T19:45:53.4140421Z 2025-05-07T19:45:53.4140425Z 2025-05-07T19:45:53.4140428Z 2025-05-07T19:45:53.4140437Z 2025-05-07T19:45:53.4704279Z libnpp-12.3.3.65 | 130.6 MB | 5 | 6%  2025-05-07T19:45:53.4704596Z 2025-05-07T19:45:53.5142060Z nsight-compute-2025. | 320.6 MB | #######4 | 75%  2025-05-07T19:45:53.5142405Z 2025-05-07T19:45:53.5142410Z 2025-05-07T19:45:53.5142414Z 2025-05-07T19:45:53.5142433Z 2025-05-07T19:45:53.5142437Z 2025-05-07T19:45:53.5376647Z libnpp-12.3.3.65 | 130.6 MB | #1 | 12%  2025-05-07T19:45:53.5676408Z libcublas-12.8.3.14 | 460.2 MB | #########2 | 92% 2025-05-07T19:45:53.5676818Z 2025-05-07T19:45:53.5676856Z 2025-05-07T19:45:53.6023399Z libcusparse-12.5.7.5 | 164.9 MB | ########## | 100%  2025-05-07T19:45:53.6023714Z 2025-05-07T19:45:53.6174398Z nsight-compute-2025. | 320.6 MB | #######7 | 77%  2025-05-07T19:45:53.6174709Z 2025-05-07T19:45:53.6174713Z 2025-05-07T19:45:53.6174717Z 2025-05-07T19:45:53.6174720Z 2025-05-07T19:45:53.6174724Z 2025-05-07T19:45:53.6174728Z 2025-05-07T19:45:53.6406263Z cuda-nsight-12.8.55 | 113.2 MB | | 0%  2025-05-07T19:45:53.7176224Z libcublas-12.8.3.14 | 460.2 MB | #########4 | 95% 2025-05-07T19:45:53.7176561Z 2025-05-07T19:45:53.7176811Z 2025-05-07T19:45:53.7176815Z 2025-05-07T19:45:53.7176820Z 2025-05-07T19:45:53.7176824Z 2025-05-07T19:45:53.7176970Z 2025-05-07T19:45:53.7289462Z cuda-nsight-12.8.55 | 113.2 MB | 6 | 7%  2025-05-07T19:45:53.7289813Z 2025-05-07T19:45:53.7289818Z 2025-05-07T19:45:53.7289822Z 2025-05-07T19:45:53.7289825Z 2025-05-07T19:45:53.7289829Z 2025-05-07T19:45:53.7508686Z libnpp-12.3.3.65 | 130.6 MB | #6 | 16%  2025-05-07T19:45:53.7509013Z 2025-05-07T19:45:53.7509018Z 2025-05-07T19:45:53.7510347Z 2025-05-07T19:45:53.7797442Z libcusolver-11.7.2.5 | 156.9 MB | ########## | 100%  2025-05-07T19:45:53.7797775Z 2025-05-07T19:45:53.8023487Z nsight-compute-2025. | 320.6 MB | #######9 | 79%  2025-05-07T19:45:53.8023793Z 2025-05-07T19:45:53.8023799Z 2025-05-07T19:45:53.8023802Z 2025-05-07T19:45:53.8023821Z 2025-05-07T19:45:53.8023825Z 2025-05-07T19:45:53.8023829Z 2025-05-07T19:45:53.8025107Z 2025-05-07T19:45:53.8292796Z cuda-nvvp-12.8.57 | 112.4 MB | | 0%  2025-05-07T19:45:53.8293126Z 2025-05-07T19:45:53.8293155Z 2025-05-07T19:45:53.8293159Z 2025-05-07T19:45:53.8293176Z 2025-05-07T19:45:53.8293180Z 2025-05-07T19:45:53.8328480Z libnpp-12.3.3.65 | 130.6 MB | ## | 20%  2025-05-07T19:45:53.8616733Z libcublas-12.8.3.14 | 460.2 MB | #########7 | 98% 2025-05-07T19:45:53.8617201Z 2025-05-07T19:45:53.8617223Z 2025-05-07T19:45:53.8617228Z 2025-05-07T19:45:53.8617233Z 2025-05-07T19:45:53.8617238Z 2025-05-07T19:45:53.8617242Z 2025-05-07T19:45:53.8930196Z cuda-nsight-12.8.55 | 113.2 MB | # | 10%  2025-05-07T19:45:53.8930534Z 2025-05-07T19:45:53.9024793Z nsight-compute-2025. | 320.6 MB | ######## | 81%  2025-05-07T19:45:53.9025120Z 2025-05-07T19:45:53.9025125Z 2025-05-07T19:45:53.9025129Z 2025-05-07T19:45:53.9025133Z 2025-05-07T19:45:53.9025138Z 2025-05-07T19:45:53.9025142Z 2025-05-07T19:45:53.9025174Z 2025-05-07T19:45:53.9294226Z cuda-nvvp-12.8.57 | 112.4 MB | 3 | 4%  2025-05-07T19:45:53.9294555Z 2025-05-07T19:45:53.9294583Z 2025-05-07T19:45:53.9294587Z 2025-05-07T19:45:53.9294591Z 2025-05-07T19:45:53.9294595Z 2025-05-07T19:45:53.9620388Z libnpp-12.3.3.65 | 130.6 MB | ##4 | 24%  2025-05-07T19:45:53.9620737Z 2025-05-07T19:45:53.9620742Z 2025-05-07T19:45:53.9620746Z 2025-05-07T19:45:53.9620751Z 2025-05-07T19:45:53.9620754Z 2025-05-07T19:45:53.9620773Z 2025-05-07T19:45:54.0025055Z cuda-nsight-12.8.55 | 113.2 MB | #4 | 15%  2025-05-07T19:45:54.0025396Z 2025-05-07T19:45:54.0025400Z 2025-05-07T19:45:54.0025405Z 2025-05-07T19:45:54.0025409Z 2025-05-07T19:45:54.0025412Z 2025-05-07T19:45:54.0025416Z 2025-05-07T19:45:54.0026278Z 2025-05-07T19:45:54.0295934Z cuda-nvvp-12.8.57 | 112.4 MB | 8 | 9%  2025-05-07T19:45:54.0296265Z 2025-05-07T19:45:54.0296270Z 2025-05-07T19:45:54.0296298Z 2025-05-07T19:45:54.0296302Z 2025-05-07T19:45:54.0296305Z 2025-05-07T19:45:54.1027087Z libnpp-12.3.3.65 | 130.6 MB | ##8 | 28%  2025-05-07T19:45:54.1027422Z 2025-05-07T19:45:54.1027427Z 2025-05-07T19:45:54.1027431Z 2025-05-07T19:45:54.1027435Z 2025-05-07T19:45:54.1027439Z 2025-05-07T19:45:54.1027442Z 2025-05-07T19:45:54.1027445Z 2025-05-07T19:45:54.1029826Z cuda-nvvp-12.8.57 | 112.4 MB | #3 | 13%  2025-05-07T19:45:54.1030129Z 2025-05-07T19:45:54.1030148Z 2025-05-07T19:45:54.1030151Z 2025-05-07T19:45:54.1030155Z 2025-05-07T19:45:54.1030158Z 2025-05-07T19:45:54.1030937Z 2025-05-07T19:45:54.1084348Z cuda-nsight-12.8.55 | 113.2 MB | #8 | 18%  2025-05-07T19:45:54.1084696Z 2025-05-07T19:45:54.1300887Z nsight-compute-2025. | 320.6 MB | ########2 | 83%  2025-05-07T19:45:54.1301208Z 2025-05-07T19:45:54.1301230Z 2025-05-07T19:45:54.1301235Z 2025-05-07T19:45:54.1301240Z 2025-05-07T19:45:54.1301458Z 2025-05-07T19:45:54.2028173Z libnpp-12.3.3.65 | 130.6 MB | ###2 | 33%  2025-05-07T19:45:54.2028496Z 2025-05-07T19:45:54.2028739Z 2025-05-07T19:45:54.2028745Z 2025-05-07T19:45:54.2028749Z 2025-05-07T19:45:54.2028768Z 2025-05-07T19:45:54.2028771Z 2025-05-07T19:45:54.2028775Z 2025-05-07T19:45:54.2032755Z cuda-nvvp-12.8.57 | 112.4 MB | #6 | 17%  2025-05-07T19:45:54.2033057Z 2025-05-07T19:45:54.2033061Z 2025-05-07T19:45:54.2033069Z 2025-05-07T19:45:54.2033072Z 2025-05-07T19:45:54.2033076Z 2025-05-07T19:45:54.2033417Z 2025-05-07T19:45:54.2086262Z cuda-nsight-12.8.55 | 113.2 MB | ##2 | 23%  2025-05-07T19:45:54.2086596Z 2025-05-07T19:45:54.2366603Z nsight-compute-2025. | 320.6 MB | ########4 | 84%  2025-05-07T19:45:54.2366914Z 2025-05-07T19:45:54.2366919Z 2025-05-07T19:45:54.2366923Z 2025-05-07T19:45:54.2366927Z 2025-05-07T19:45:54.2366931Z 2025-05-07T19:45:54.3029514Z libnpp-12.3.3.65 | 130.6 MB | ###6 | 37%  2025-05-07T19:45:54.3029867Z 2025-05-07T19:45:54.3029871Z 2025-05-07T19:45:54.3029876Z 2025-05-07T19:45:54.3029896Z 2025-05-07T19:45:54.3029900Z 2025-05-07T19:45:54.3029903Z 2025-05-07T19:45:54.3029907Z 2025-05-07T19:45:54.3031545Z cuda-nvvp-12.8.57 | 112.4 MB | ## | 21%  2025-05-07T19:45:54.3031860Z 2025-05-07T19:45:54.3031868Z 2025-05-07T19:45:54.3031871Z 2025-05-07T19:45:54.3031875Z 2025-05-07T19:45:54.3031879Z 2025-05-07T19:45:54.3032021Z 2025-05-07T19:45:54.3086899Z cuda-nsight-12.8.55 | 113.2 MB | ##7 | 27%  2025-05-07T19:45:54.3087251Z 2025-05-07T19:45:54.3367463Z nsight-compute-2025. | 320.6 MB | ########5 | 86%  2025-05-07T19:45:54.3367772Z 2025-05-07T19:45:54.3367777Z 2025-05-07T19:45:54.3367780Z 2025-05-07T19:45:54.3367784Z 2025-05-07T19:45:54.3367800Z 2025-05-07T19:45:54.4031539Z libnpp-12.3.3.65 | 130.6 MB | ####1 | 41%  2025-05-07T19:45:54.4031898Z 2025-05-07T19:45:54.4031903Z 2025-05-07T19:45:54.4031906Z 2025-05-07T19:45:54.4031910Z 2025-05-07T19:45:54.4031913Z 2025-05-07T19:45:54.4031930Z 2025-05-07T19:45:54.4033704Z 2025-05-07T19:45:54.4089711Z cuda-nvvp-12.8.57 | 112.4 MB | ##5 | 26%  2025-05-07T19:45:54.4090042Z 2025-05-07T19:45:54.4368959Z nsight-compute-2025. | 320.6 MB | ########7 | 88%  2025-05-07T19:45:54.4369280Z 2025-05-07T19:45:54.4369284Z 2025-05-07T19:45:54.4369288Z 2025-05-07T19:45:54.4369291Z 2025-05-07T19:45:54.4369295Z 2025-05-07T19:45:54.5036496Z libnpp-12.3.3.65 | 130.6 MB | ####5 | 46%  2025-05-07T19:45:54.5036821Z 2025-05-07T19:45:54.5036826Z 2025-05-07T19:45:54.5036829Z 2025-05-07T19:45:54.5036833Z 2025-05-07T19:45:54.5036836Z 2025-05-07T19:45:54.5036839Z 2025-05-07T19:45:54.5036843Z 2025-05-07T19:45:54.5090438Z cuda-nvvp-12.8.57 | 112.4 MB | ### | 30%  2025-05-07T19:45:54.5090773Z 2025-05-07T19:45:54.5368700Z nsight-compute-2025. | 320.6 MB | ########9 | 90%  2025-05-07T19:45:54.5369014Z 2025-05-07T19:45:54.5369033Z 2025-05-07T19:45:54.5369066Z 2025-05-07T19:45:54.5369071Z 2025-05-07T19:45:54.5369074Z 2025-05-07T19:45:54.5385570Z libnpp-12.3.3.65 | 130.6 MB | ####9 | 50%  2025-05-07T19:45:54.5385895Z 2025-05-07T19:45:54.5385900Z 2025-05-07T19:45:54.5385903Z 2025-05-07T19:45:54.5385908Z 2025-05-07T19:45:54.5385912Z 2025-05-07T19:45:54.5386349Z 2025-05-07T19:45:54.6129190Z cuda-nsight-12.8.55 | 113.2 MB | ###1 | 32%  2025-05-07T19:45:54.6129536Z 2025-05-07T19:45:54.6129543Z 2025-05-07T19:45:54.6129549Z 2025-05-07T19:45:54.6129553Z 2025-05-07T19:45:54.6129557Z 2025-05-07T19:45:54.6129575Z 2025-05-07T19:45:54.6129579Z 2025-05-07T19:45:54.6150047Z cuda-nvvp-12.8.57 | 112.4 MB | ###4 | 35%  2025-05-07T19:45:54.6150378Z 2025-05-07T19:45:54.6388502Z nsight-compute-2025. | 320.6 MB | #########1 | 91%  2025-05-07T19:45:54.6389084Z 2025-05-07T19:45:54.6389089Z 2025-05-07T19:45:54.6389093Z 2025-05-07T19:45:54.6389097Z 2025-05-07T19:45:54.6389227Z 2025-05-07T19:45:54.6389232Z 2025-05-07T19:45:54.6470742Z cuda-nsight-12.8.55 | 113.2 MB | ###5 | 35%  2025-05-07T19:45:54.6471094Z 2025-05-07T19:45:54.6471098Z 2025-05-07T19:45:54.6471101Z 2025-05-07T19:45:54.6471105Z 2025-05-07T19:45:54.6471109Z 2025-05-07T19:45:54.7144254Z libnpp-12.3.3.65 | 130.6 MB | #####4 | 54%  2025-05-07T19:45:54.7144588Z 2025-05-07T19:45:54.7144592Z 2025-05-07T19:45:54.7144595Z 2025-05-07T19:45:54.7144600Z 2025-05-07T19:45:54.7144604Z 2025-05-07T19:45:54.7144608Z 2025-05-07T19:45:54.7145601Z 2025-05-07T19:45:54.7282598Z cuda-nvvp-12.8.57 | 112.4 MB | ###9 | 39%  2025-05-07T19:45:54.7283534Z 2025-05-07T19:45:54.7388700Z nsight-compute-2025. | 320.6 MB | #########3 | 93%  2025-05-07T19:45:54.7389010Z 2025-05-07T19:45:54.7389015Z 2025-05-07T19:45:54.7389040Z 2025-05-07T19:45:54.7389044Z 2025-05-07T19:45:54.7389048Z 2025-05-07T19:45:54.7389081Z 2025-05-07T19:45:54.7509403Z cuda-nsight-12.8.55 | 113.2 MB | ###9 | 40%  2025-05-07T19:45:54.7509752Z 2025-05-07T19:45:54.7509756Z 2025-05-07T19:45:54.7509760Z 2025-05-07T19:45:54.7509764Z 2025-05-07T19:45:54.7509767Z 2025-05-07T19:45:54.8182550Z libnpp-12.3.3.65 | 130.6 MB | #####8 | 58%  2025-05-07T19:45:54.8182886Z 2025-05-07T19:45:54.8182892Z 2025-05-07T19:45:54.8182896Z 2025-05-07T19:45:54.8182899Z 2025-05-07T19:45:54.8182903Z 2025-05-07T19:45:54.8182906Z 2025-05-07T19:45:54.8183761Z 2025-05-07T19:45:54.8282812Z cuda-nvvp-12.8.57 | 112.4 MB | ####3 | 43%  2025-05-07T19:45:54.8283148Z 2025-05-07T19:45:54.8389102Z nsight-compute-2025. | 320.6 MB | #########4 | 95%  2025-05-07T19:45:54.8389430Z 2025-05-07T19:45:54.8389435Z 2025-05-07T19:45:54.8389440Z 2025-05-07T19:45:54.8389443Z 2025-05-07T19:45:54.8389471Z 2025-05-07T19:45:54.8391983Z 2025-05-07T19:45:54.8550577Z cuda-nsight-12.8.55 | 113.2 MB | ####4 | 44%  2025-05-07T19:45:54.8550924Z 2025-05-07T19:45:54.8550928Z 2025-05-07T19:45:54.8550932Z 2025-05-07T19:45:54.8550935Z 2025-05-07T19:45:54.8550939Z 2025-05-07T19:45:54.9219663Z libnpp-12.3.3.65 | 130.6 MB | ######2 | 62%  2025-05-07T19:45:54.9220010Z 2025-05-07T19:45:54.9220016Z 2025-05-07T19:45:54.9220020Z 2025-05-07T19:45:54.9220024Z 2025-05-07T19:45:54.9220029Z 2025-05-07T19:45:54.9220033Z 2025-05-07T19:45:54.9220037Z 2025-05-07T19:45:54.9284857Z cuda-nvvp-12.8.57 | 112.4 MB | ####7 | 48%  2025-05-07T19:45:54.9285180Z 2025-05-07T19:45:54.9390654Z nsight-compute-2025. | 320.6 MB | #########6 | 96%  2025-05-07T19:45:54.9390960Z 2025-05-07T19:45:54.9390965Z 2025-05-07T19:45:54.9390969Z 2025-05-07T19:45:54.9390973Z 2025-05-07T19:45:54.9390977Z 2025-05-07T19:45:54.9390980Z 2025-05-07T19:45:54.9550796Z cuda-nsight-12.8.55 | 113.2 MB | ####8 | 49%  2025-05-07T19:45:54.9551131Z 2025-05-07T19:45:54.9551156Z 2025-05-07T19:45:54.9551160Z 2025-05-07T19:45:54.9551164Z 2025-05-07T19:45:54.9551167Z 2025-05-07T19:45:55.0221771Z libnpp-12.3.3.65 | 130.6 MB | ######6 | 67%  2025-05-07T19:45:55.0222094Z 2025-05-07T19:45:55.0222098Z 2025-05-07T19:45:55.0222105Z 2025-05-07T19:45:55.0222110Z 2025-05-07T19:45:55.0222114Z 2025-05-07T19:45:55.0222117Z 2025-05-07T19:45:55.0222121Z 2025-05-07T19:45:55.0320706Z cuda-nvvp-12.8.57 | 112.4 MB | #####1 | 52%  2025-05-07T19:45:55.0321027Z 2025-05-07T19:45:55.0393920Z nsight-compute-2025. | 320.6 MB | #########8 | 98%  2025-05-07T19:45:55.0394223Z 2025-05-07T19:45:55.0394229Z 2025-05-07T19:45:55.0394233Z 2025-05-07T19:45:55.0394252Z 2025-05-07T19:45:55.0394256Z 2025-05-07T19:45:55.0395304Z 2025-05-07T19:45:55.0551485Z cuda-nsight-12.8.55 | 113.2 MB | #####3 | 53%  2025-05-07T19:45:55.0552074Z 2025-05-07T19:45:55.0552079Z 2025-05-07T19:45:55.0552083Z 2025-05-07T19:45:55.0552087Z 2025-05-07T19:45:55.0552272Z 2025-05-07T19:45:55.1223878Z libnpp-12.3.3.65 | 130.6 MB | ####### | 71%  2025-05-07T19:45:55.1225227Z 2025-05-07T19:45:55.1225231Z 2025-05-07T19:45:55.1225235Z 2025-05-07T19:45:55.1225239Z 2025-05-07T19:45:55.1225243Z 2025-05-07T19:45:55.1225247Z 2025-05-07T19:45:55.1225251Z 2025-05-07T19:45:55.1397648Z cuda-nvvp-12.8.57 | 112.4 MB | #####5 | 56%  2025-05-07T19:45:55.1397973Z 2025-05-07T19:45:55.1398111Z 2025-05-07T19:45:55.1398121Z 2025-05-07T19:45:55.1398127Z 2025-05-07T19:45:55.1398133Z 2025-05-07T19:45:55.1398143Z 2025-05-07T19:45:55.1438624Z cuda-nsight-12.8.55 | 113.2 MB | #####7 | 57%  2025-05-07T19:45:55.1439135Z 2025-05-07T19:45:55.1618617Z nsight-compute-2025. | 320.6 MB | #########9 | 100%  2025-05-07T19:45:55.1618917Z 2025-05-07T19:45:55.1618947Z 2025-05-07T19:45:55.1619015Z 2025-05-07T19:45:55.1619023Z 2025-05-07T19:45:55.1619055Z 2025-05-07T19:45:55.2226870Z libnpp-12.3.3.65 | 130.6 MB | #######4 | 75%  2025-05-07T19:45:55.2227187Z 2025-05-07T19:45:55.2227191Z 2025-05-07T19:45:55.2227195Z 2025-05-07T19:45:55.2227198Z 2025-05-07T19:45:55.2227202Z 2025-05-07T19:45:55.2227205Z 2025-05-07T19:45:55.2227287Z 2025-05-07T19:45:55.2399928Z cuda-nvvp-12.8.57 | 112.4 MB | ###### | 61%  2025-05-07T19:45:55.2400423Z 2025-05-07T19:45:55.2400427Z 2025-05-07T19:45:55.2400432Z 2025-05-07T19:45:55.2400436Z 2025-05-07T19:45:55.2400440Z 2025-05-07T19:45:55.2400443Z 2025-05-07T19:45:55.2621395Z cuda-nsight-12.8.55 | 113.2 MB | ######2 | 63%  2025-05-07T19:45:55.2621730Z 2025-05-07T19:45:55.2621734Z 2025-05-07T19:45:55.2621738Z 2025-05-07T19:45:55.2621741Z 2025-05-07T19:45:55.2621745Z 2025-05-07T19:45:55.3229320Z libnpp-12.3.3.65 | 130.6 MB | #######9 | 79%  2025-05-07T19:45:55.3229649Z 2025-05-07T19:45:55.3229654Z 2025-05-07T19:45:55.3229662Z 2025-05-07T19:45:55.3229678Z 2025-05-07T19:45:55.3229682Z 2025-05-07T19:45:55.3229685Z 2025-05-07T19:45:55.3229694Z 2025-05-07T19:45:55.3400482Z cuda-nvvp-12.8.57 | 112.4 MB | ######6 | 66%  2025-05-07T19:45:55.3400810Z 2025-05-07T19:45:55.3401143Z 2025-05-07T19:45:55.3401152Z 2025-05-07T19:45:55.3401157Z 2025-05-07T19:45:55.3401162Z 2025-05-07T19:45:55.3401167Z 2025-05-07T19:45:55.3621457Z cuda-nsight-12.8.55 | 113.2 MB | ######7 | 68%  2025-05-07T19:45:55.3621796Z 2025-05-07T19:45:55.3621801Z 2025-05-07T19:45:55.3621804Z 2025-05-07T19:45:55.3621807Z 2025-05-07T19:45:55.3621812Z 2025-05-07T19:45:55.4236701Z libnpp-12.3.3.65 | 130.6 MB | ########3 | 84%  2025-05-07T19:45:55.4237028Z 2025-05-07T19:45:55.4237033Z 2025-05-07T19:45:55.4237036Z 2025-05-07T19:45:55.4237040Z 2025-05-07T19:45:55.4237043Z 2025-05-07T19:45:55.4237062Z 2025-05-07T19:45:55.4237067Z 2025-05-07T19:45:55.4402794Z cuda-nvvp-12.8.57 | 112.4 MB | #######1 | 71%  2025-05-07T19:45:55.4403126Z 2025-05-07T19:45:55.4403130Z 2025-05-07T19:45:55.4403134Z 2025-05-07T19:45:55.4403138Z 2025-05-07T19:45:55.4403141Z 2025-05-07T19:45:55.4403149Z 2025-05-07T19:45:55.4623649Z cuda-nsight-12.8.55 | 113.2 MB | #######2 | 73%  2025-05-07T19:45:55.4623998Z 2025-05-07T19:45:55.4624003Z 2025-05-07T19:45:55.4624006Z 2025-05-07T19:45:55.4624010Z 2025-05-07T19:45:55.4624014Z 2025-05-07T19:45:55.5237837Z libnpp-12.3.3.65 | 130.6 MB | ########8 | 89%  2025-05-07T19:45:55.5238215Z 2025-05-07T19:45:55.5238220Z 2025-05-07T19:45:55.5238224Z 2025-05-07T19:45:55.5238228Z 2025-05-07T19:45:55.5238232Z 2025-05-07T19:45:55.5238236Z 2025-05-07T19:45:55.5238304Z 2025-05-07T19:45:55.5404291Z cuda-nvvp-12.8.57 | 112.4 MB | #######6 | 76%  2025-05-07T19:45:55.5404880Z 2025-05-07T19:45:55.5404885Z 2025-05-07T19:45:55.5404889Z 2025-05-07T19:45:55.5404893Z 2025-05-07T19:45:55.5404897Z 2025-05-07T19:45:55.5405033Z 2025-05-07T19:45:55.5624098Z cuda-nsight-12.8.55 | 113.2 MB | #######7 | 78%  2025-05-07T19:45:55.5624441Z 2025-05-07T19:45:55.5624446Z 2025-05-07T19:45:55.5624450Z 2025-05-07T19:45:55.5624454Z 2025-05-07T19:45:55.5624457Z 2025-05-07T19:45:55.6245776Z libnpp-12.3.3.65 | 130.6 MB | #########3 | 94%  2025-05-07T19:45:55.6246099Z 2025-05-07T19:45:55.6246104Z 2025-05-07T19:45:55.6246109Z 2025-05-07T19:45:55.6246126Z 2025-05-07T19:45:55.6246131Z 2025-05-07T19:45:55.6246134Z 2025-05-07T19:45:55.6246138Z 2025-05-07T19:45:55.6410063Z cuda-nvvp-12.8.57 | 112.4 MB | ########1 | 81%  2025-05-07T19:45:55.6410393Z 2025-05-07T19:45:55.6410398Z 2025-05-07T19:45:55.6410401Z 2025-05-07T19:45:55.6410405Z 2025-05-07T19:45:55.6410409Z 2025-05-07T19:45:55.6410413Z 2025-05-07T19:45:55.6625953Z cuda-nsight-12.8.55 | 113.2 MB | ########3 | 83%  2025-05-07T19:45:55.6626314Z 2025-05-07T19:45:55.6626320Z 2025-05-07T19:45:55.6626337Z 2025-05-07T19:45:55.6626341Z 2025-05-07T19:45:55.6626345Z 2025-05-07T19:45:55.7246115Z libnpp-12.3.3.65 | 130.6 MB | #########8 | 98%  2025-05-07T19:45:55.7246446Z 2025-05-07T19:45:55.7246450Z 2025-05-07T19:45:55.7246454Z 2025-05-07T19:45:55.7246458Z 2025-05-07T19:45:55.7246461Z 2025-05-07T19:45:55.7246465Z 2025-05-07T19:45:55.7246468Z 2025-05-07T19:45:55.7407590Z cuda-nvvp-12.8.57 | 112.4 MB | ########6 | 87%  2025-05-07T19:45:55.7407920Z 2025-05-07T19:45:55.7407924Z 2025-05-07T19:45:55.7407928Z 2025-05-07T19:45:55.7407932Z 2025-05-07T19:45:55.7407950Z 2025-05-07T19:45:55.7407954Z 2025-05-07T19:45:55.8357381Z cuda-nsight-12.8.55 | 113.2 MB | ########9 | 89%  2025-05-07T19:45:55.8357729Z 2025-05-07T19:45:55.8357734Z 2025-05-07T19:45:55.8357739Z 2025-05-07T19:45:55.8357767Z 2025-05-07T19:45:55.8357772Z 2025-05-07T19:45:55.8357776Z 2025-05-07T19:45:55.8357795Z 2025-05-07T19:45:55.8643887Z cuda-nvvp-12.8.57 | 112.4 MB | #########1 | 92%  2025-05-07T19:45:55.8644240Z 2025-05-07T19:45:55.8644244Z 2025-05-07T19:45:55.8644247Z 2025-05-07T19:45:55.8644251Z 2025-05-07T19:45:55.8644255Z 2025-05-07T19:45:55.8644259Z 2025-05-07T19:45:55.9531983Z cuda-nsight-12.8.55 | 113.2 MB | #########4 | 95%  2025-05-07T19:45:55.9532341Z 2025-05-07T19:45:55.9532346Z 2025-05-07T19:45:55.9532349Z 2025-05-07T19:45:55.9532353Z 2025-05-07T19:45:55.9532357Z 2025-05-07T19:45:55.9532360Z 2025-05-07T19:45:55.9532364Z 2025-05-07T19:45:56.5526444Z cuda-nvvp-12.8.57 | 112.4 MB | #########6 | 97%  2025-05-07T19:45:56.5526817Z 2025-05-07T19:45:56.5526822Z 2025-05-07T19:45:56.5526826Z 2025-05-07T19:45:56.5526830Z 2025-05-07T19:45:57.0498267Z libcufft-11.3.3.41 | 147.4 MB | ########## | 100%  2025-05-07T19:45:57.0498651Z 2025-05-07T19:45:57.0498656Z 2025-05-07T19:45:57.0498660Z 2025-05-07T19:45:57.0498664Z 2025-05-07T19:45:57.0498681Z 2025-05-07T19:45:57.0570676Z libnpp-12.3.3.65 | 130.6 MB | ########## | 100%  2025-05-07T19:45:57.0570997Z 2025-05-07T19:45:57.0571002Z 2025-05-07T19:45:57.0571005Z 2025-05-07T19:45:57.0571009Z 2025-05-07T19:45:57.0571012Z 2025-05-07T19:45:57.0571015Z 2025-05-07T19:45:57.0571019Z 2025-05-07T19:45:57.0904120Z cuda-nvvp-12.8.57 | 112.4 MB | ########## | 100%  2025-05-07T19:45:57.0904477Z 2025-05-07T19:45:57.0904481Z 2025-05-07T19:45:57.0904486Z 2025-05-07T19:45:57.0904489Z 2025-05-07T19:45:57.0904493Z 2025-05-07T19:45:57.0904496Z 2025-05-07T19:45:57.0904770Z cuda-nsight-12.8.55 | 113.2 MB | ########## | 100%  2025-05-07T19:45:57.0905085Z 2025-05-07T19:45:57.0905089Z 2025-05-07T19:45:57.0905092Z 2025-05-07T19:45:57.0905096Z 2025-05-07T19:45:57.0905099Z 2025-05-07T19:45:57.0905102Z 2025-05-07T19:45:57.1141388Z cuda-nsight-12.8.55 | 113.2 MB | ########## | 100%  2025-05-07T19:45:57.1141727Z 2025-05-07T19:45:57.1141941Z 2025-05-07T19:45:57.1141963Z 2025-05-07T19:45:57.1141967Z 2025-05-07T19:45:57.1141970Z 2025-05-07T19:45:57.1141973Z 2025-05-07T19:45:57.1141977Z 2025-05-07T19:45:57.1141980Z 2025-05-07T19:45:57.1209436Z cuda-nvrtc-12.8.61 | 63.1 MB | | 0%  2025-05-07T19:45:57.1209773Z 2025-05-07T19:45:57.1209792Z 2025-05-07T19:45:57.1209796Z 2025-05-07T19:45:57.1209800Z 2025-05-07T19:45:57.1209803Z 2025-05-07T19:45:57.1209806Z 2025-05-07T19:45:57.1209810Z 2025-05-07T19:45:57.1209813Z 2025-05-07T19:45:57.1209817Z 2025-05-07T19:45:57.1613070Z libcurand-10.3.9.55 | 43.6 MB | | 0%  2025-05-07T19:45:57.1613410Z 2025-05-07T19:45:57.1613574Z 2025-05-07T19:45:57.1613583Z 2025-05-07T19:45:57.1613587Z 2025-05-07T19:45:57.1613592Z 2025-05-07T19:45:57.1613596Z 2025-05-07T19:45:57.1613621Z 2025-05-07T19:45:57.1613626Z 2025-05-07T19:45:57.1613639Z 2025-05-07T19:45:57.1613643Z 2025-05-07T19:45:57.2141970Z gds-tools-1.13.0.11 | 37.9 MB | | 0%  2025-05-07T19:45:57.2142320Z 2025-05-07T19:45:57.2142343Z 2025-05-07T19:45:57.2142347Z 2025-05-07T19:45:57.2142350Z 2025-05-07T19:45:57.2142354Z 2025-05-07T19:45:57.2142357Z 2025-05-07T19:45:57.2142361Z 2025-05-07T19:45:57.2142364Z 2025-05-07T19:45:57.2209311Z cuda-nvrtc-12.8.61 | 63.1 MB | #2 | 13%  2025-05-07T19:45:57.2209712Z 2025-05-07T19:45:57.2209931Z 2025-05-07T19:45:57.2209939Z 2025-05-07T19:45:57.2209945Z 2025-05-07T19:45:57.2209949Z 2025-05-07T19:45:57.2209954Z 2025-05-07T19:45:57.2209958Z 2025-05-07T19:45:57.2209963Z 2025-05-07T19:45:57.2211478Z 2025-05-07T19:45:57.2684425Z libcurand-10.3.9.55 | 43.6 MB | #2 | 12%  2025-05-07T19:45:57.2684992Z 2025-05-07T19:45:57.2684998Z 2025-05-07T19:45:57.2685003Z 2025-05-07T19:45:57.2685026Z 2025-05-07T19:45:57.2685029Z 2025-05-07T19:45:57.2685034Z 2025-05-07T19:45:57.2685037Z 2025-05-07T19:45:57.2685058Z 2025-05-07T19:45:57.2685062Z 2025-05-07T19:45:57.2685065Z 2025-05-07T19:45:57.3142569Z gds-tools-1.13.0.11 | 37.9 MB | # | 11%  2025-05-07T19:45:57.3142928Z 2025-05-07T19:45:57.3142933Z 2025-05-07T19:45:57.3142937Z 2025-05-07T19:45:57.3142941Z 2025-05-07T19:45:57.3142945Z 2025-05-07T19:45:57.3142949Z 2025-05-07T19:45:57.3142953Z 2025-05-07T19:45:57.3142956Z 2025-05-07T19:45:57.3211711Z cuda-nvrtc-12.8.61 | 63.1 MB | ##3 | 23%  2025-05-07T19:45:57.3212065Z 2025-05-07T19:45:57.3212069Z 2025-05-07T19:45:57.3212073Z 2025-05-07T19:45:57.3212077Z 2025-05-07T19:45:57.3212080Z 2025-05-07T19:45:57.3212083Z 2025-05-07T19:45:57.3212087Z 2025-05-07T19:45:57.3212091Z 2025-05-07T19:45:57.3212095Z 2025-05-07T19:45:57.3822215Z libcurand-10.3.9.55 | 43.6 MB | ##7 | 28%  2025-05-07T19:45:57.3822591Z 2025-05-07T19:45:57.3822596Z 2025-05-07T19:45:57.3822601Z 2025-05-07T19:45:57.3822618Z 2025-05-07T19:45:57.3822621Z 2025-05-07T19:45:57.3822625Z 2025-05-07T19:45:57.3822628Z 2025-05-07T19:45:57.3822631Z 2025-05-07T19:45:57.3822635Z 2025-05-07T19:45:57.3822638Z 2025-05-07T19:45:57.4142931Z gds-tools-1.13.0.11 | 37.9 MB | #6 | 17%  2025-05-07T19:45:57.4143278Z 2025-05-07T19:45:57.4143282Z 2025-05-07T19:45:57.4143286Z 2025-05-07T19:45:57.4143289Z 2025-05-07T19:45:57.4143293Z 2025-05-07T19:45:57.4143296Z 2025-05-07T19:45:57.4143299Z 2025-05-07T19:45:57.4143302Z 2025-05-07T19:45:57.4209518Z cuda-nvrtc-12.8.61 | 63.1 MB | ###3 | 34%  2025-05-07T19:45:57.4209847Z 2025-05-07T19:45:57.4209852Z 2025-05-07T19:45:57.4209856Z 2025-05-07T19:45:57.4209860Z 2025-05-07T19:45:57.4209865Z 2025-05-07T19:45:57.4209869Z 2025-05-07T19:45:57.4209873Z 2025-05-07T19:45:57.4210118Z 2025-05-07T19:45:57.4210489Z 2025-05-07T19:45:57.4858940Z libcurand-10.3.9.55 | 43.6 MB | ####1 | 42%  2025-05-07T19:45:57.4859303Z 2025-05-07T19:45:57.4859308Z 2025-05-07T19:45:57.4859311Z 2025-05-07T19:45:57.4859315Z 2025-05-07T19:45:57.4859318Z 2025-05-07T19:45:57.4859322Z 2025-05-07T19:45:57.4859325Z 2025-05-07T19:45:57.4859329Z 2025-05-07T19:45:57.4859332Z 2025-05-07T19:45:57.4859336Z 2025-05-07T19:45:57.5144224Z gds-tools-1.13.0.11 | 37.9 MB | ##6 | 27%  2025-05-07T19:45:57.5144566Z 2025-05-07T19:45:57.5144570Z 2025-05-07T19:45:57.5144574Z 2025-05-07T19:45:57.5144577Z 2025-05-07T19:45:57.5144581Z 2025-05-07T19:45:57.5144584Z 2025-05-07T19:45:57.5144588Z 2025-05-07T19:45:57.5144592Z 2025-05-07T19:45:57.5217554Z cuda-nvrtc-12.8.61 | 63.1 MB | ####3 | 44%  2025-05-07T19:45:57.5217890Z 2025-05-07T19:45:57.5217894Z 2025-05-07T19:45:57.5217898Z 2025-05-07T19:45:57.5217902Z 2025-05-07T19:45:57.5217923Z 2025-05-07T19:45:57.5217927Z 2025-05-07T19:45:57.5217932Z 2025-05-07T19:45:57.5217935Z 2025-05-07T19:45:57.5217949Z 2025-05-07T19:45:57.6221385Z libcurand-10.3.9.55 | 43.6 MB | #####5 | 55%  2025-05-07T19:45:57.6221729Z 2025-05-07T19:45:57.6221735Z 2025-05-07T19:45:57.6221739Z 2025-05-07T19:45:57.6221742Z 2025-05-07T19:45:57.6221746Z 2025-05-07T19:45:57.6221749Z 2025-05-07T19:45:57.6221752Z 2025-05-07T19:45:57.6221767Z 2025-05-07T19:45:57.6221771Z 2025-05-07T19:45:57.6582095Z libcurand-10.3.9.55 | 43.6 MB | #######5 | 75%  2025-05-07T19:45:57.6582438Z 2025-05-07T19:45:57.6582442Z 2025-05-07T19:45:57.6582446Z 2025-05-07T19:45:57.6582449Z 2025-05-07T19:45:57.6582454Z 2025-05-07T19:45:57.6582457Z 2025-05-07T19:45:57.6582474Z 2025-05-07T19:45:57.6582478Z 2025-05-07T19:45:57.6655173Z cuda-nvrtc-12.8.61 | 63.1 MB | #####3 | 53%  2025-05-07T19:45:57.6655496Z 2025-05-07T19:45:57.6655525Z 2025-05-07T19:45:57.6655528Z 2025-05-07T19:45:57.6655532Z 2025-05-07T19:45:57.6655535Z 2025-05-07T19:45:57.6655552Z 2025-05-07T19:45:57.6655568Z 2025-05-07T19:45:57.6655571Z 2025-05-07T19:45:57.6655575Z 2025-05-07T19:45:57.6655578Z 2025-05-07T19:45:57.6982069Z gds-tools-1.13.0.11 | 37.9 MB | ###4 | 34%  2025-05-07T19:45:57.6982411Z 2025-05-07T19:45:57.6982416Z 2025-05-07T19:45:57.7575457Z libcusparse-12.5.7.5 | 164.9 MB | ########## | 100%  2025-05-07T19:45:57.7575775Z 2025-05-07T19:45:57.7575779Z 2025-05-07T19:45:57.7575783Z 2025-05-07T19:45:57.7575786Z 2025-05-07T19:45:57.7575790Z 2025-05-07T19:45:57.7575793Z 2025-05-07T19:45:57.7575797Z 2025-05-07T19:45:57.7575802Z 2025-05-07T19:45:57.7575808Z 2025-05-07T19:45:57.7768869Z libcurand-10.3.9.55 | 43.6 MB | ######### | 90%  2025-05-07T19:45:57.7769229Z 2025-05-07T19:45:57.7769235Z 2025-05-07T19:45:57.7769238Z 2025-05-07T19:45:57.7769242Z 2025-05-07T19:45:57.7769264Z 2025-05-07T19:45:57.7769267Z 2025-05-07T19:45:57.7769271Z 2025-05-07T19:45:57.7769274Z 2025-05-07T19:45:57.8769299Z cuda-nvrtc-12.8.61 | 63.1 MB | ######2 | 62%  2025-05-07T19:45:57.8769647Z 2025-05-07T19:45:57.8769652Z 2025-05-07T19:45:57.8769656Z 2025-05-07T19:45:57.8769660Z 2025-05-07T19:45:57.8769664Z 2025-05-07T19:45:57.8769667Z 2025-05-07T19:45:57.8769671Z 2025-05-07T19:45:57.8769674Z 2025-05-07T19:45:57.9770962Z cuda-nvrtc-12.8.61 | 63.1 MB | #######6 | 76%  2025-05-07T19:45:57.9771304Z 2025-05-07T19:45:57.9771309Z 2025-05-07T19:45:57.9771313Z 2025-05-07T19:45:57.9771316Z 2025-05-07T19:45:57.9771319Z 2025-05-07T19:45:57.9771323Z 2025-05-07T19:45:57.9771327Z 2025-05-07T19:45:57.9771330Z 2025-05-07T19:45:58.1071371Z cuda-nvrtc-12.8.61 | 63.1 MB | #########2 | 92%  2025-05-07T19:45:58.1071740Z 2025-05-07T19:45:58.1071745Z 2025-05-07T19:45:58.1071749Z 2025-05-07T19:45:58.1071989Z 2025-05-07T19:45:58.1071993Z 2025-05-07T19:45:58.1071997Z 2025-05-07T19:45:58.1072001Z 2025-05-07T19:45:58.1072005Z 2025-05-07T19:45:58.1072121Z 2025-05-07T19:45:58.1072125Z 2025-05-07T19:45:58.1323496Z gds-tools-1.13.0.11 | 37.9 MB | #### | 40%  2025-05-07T19:45:58.1323828Z 2025-05-07T19:45:58.1323833Z 2025-05-07T19:45:58.1323837Z 2025-05-07T19:45:58.2071475Z libcusolver-11.7.2.5 | 156.9 MB | ########## | 100%  2025-05-07T19:45:58.2071811Z 2025-05-07T19:45:58.2071816Z 2025-05-07T19:45:58.2071820Z 2025-05-07T19:45:58.2071824Z 2025-05-07T19:45:58.2071827Z 2025-05-07T19:45:58.2071831Z 2025-05-07T19:45:58.2071834Z 2025-05-07T19:45:58.2071837Z 2025-05-07T19:45:58.2071841Z 2025-05-07T19:45:58.2071844Z 2025-05-07T19:45:58.3190751Z gds-tools-1.13.0.11 | 37.9 MB | ###### | 60%  2025-05-07T19:45:58.3191109Z 2025-05-07T19:45:58.3191114Z 2025-05-07T19:45:58.3191118Z 2025-05-07T19:45:58.3191121Z 2025-05-07T19:45:58.3191143Z 2025-05-07T19:45:58.3191147Z 2025-05-07T19:45:58.3191151Z 2025-05-07T19:45:58.3191155Z 2025-05-07T19:45:58.3191169Z 2025-05-07T19:45:58.3191173Z 2025-05-07T19:45:58.3854674Z gds-tools-1.13.0.11 | 37.9 MB | ######9 | 70%  2025-05-07T19:45:58.3855020Z 2025-05-07T19:45:58.3855024Z 2025-05-07T19:45:58.3855028Z 2025-05-07T19:45:58.3855031Z 2025-05-07T19:45:58.3855035Z 2025-05-07T19:45:58.3855055Z 2025-05-07T19:45:58.3855058Z 2025-05-07T19:45:58.3855062Z 2025-05-07T19:45:58.3855065Z 2025-05-07T19:45:58.4237595Z libcurand-10.3.9.55 | 43.6 MB | ########## | 100%  2025-05-07T19:45:58.4237941Z 2025-05-07T19:45:58.4237945Z 2025-05-07T19:45:58.4237949Z 2025-05-07T19:45:58.4237953Z 2025-05-07T19:45:58.4237972Z 2025-05-07T19:45:58.4237975Z 2025-05-07T19:45:58.4237979Z 2025-05-07T19:45:58.4237982Z 2025-05-07T19:45:58.4237985Z 2025-05-07T19:45:58.4237990Z 2025-05-07T19:45:58.4783875Z gds-tools-1.13.0.11 | 37.9 MB | #######8 | 79%  2025-05-07T19:45:58.4784252Z 2025-05-07T19:45:58.4784256Z 2025-05-07T19:45:58.4784271Z 2025-05-07T19:45:58.4784275Z 2025-05-07T19:45:58.4784279Z 2025-05-07T19:45:58.4784282Z 2025-05-07T19:45:58.4784285Z 2025-05-07T19:45:58.4784289Z 2025-05-07T19:45:58.4784309Z 2025-05-07T19:45:58.4784313Z 2025-05-07T19:45:58.4784316Z 2025-05-07T19:45:58.5253825Z libnvjitlink-12.8.61 | 28.7 MB | | 0%  2025-05-07T19:45:58.5254194Z 2025-05-07T19:45:58.5254198Z 2025-05-07T19:45:58.5254202Z 2025-05-07T19:45:58.5254206Z 2025-05-07T19:45:58.5254209Z 2025-05-07T19:45:58.5254227Z 2025-05-07T19:45:58.5254231Z 2025-05-07T19:45:58.5254234Z 2025-05-07T19:45:58.5254238Z 2025-05-07T19:45:58.5254241Z 2025-05-07T19:45:58.5784383Z gds-tools-1.13.0.11 | 37.9 MB | ########7 | 88%  2025-05-07T19:45:58.5784734Z 2025-05-07T19:45:58.5784739Z 2025-05-07T19:45:58.5784759Z 2025-05-07T19:45:58.5784785Z 2025-05-07T19:45:58.5784789Z 2025-05-07T19:45:58.5784793Z 2025-05-07T19:45:58.5784797Z 2025-05-07T19:45:58.5784801Z 2025-05-07T19:45:58.5784823Z 2025-05-07T19:45:58.5784827Z 2025-05-07T19:45:58.5784830Z 2025-05-07T19:45:58.6785206Z libnvjitlink-12.8.61 | 28.7 MB | #6 | 17%  2025-05-07T19:45:58.6785573Z 2025-05-07T19:45:58.6785578Z 2025-05-07T19:45:58.6785584Z 2025-05-07T19:45:58.6785589Z 2025-05-07T19:45:58.6785593Z 2025-05-07T19:45:58.6785597Z 2025-05-07T19:45:58.6785600Z 2025-05-07T19:45:58.6785604Z 2025-05-07T19:45:58.6785607Z 2025-05-07T19:45:58.6785612Z 2025-05-07T19:45:58.6785623Z 2025-05-07T19:45:58.7139980Z libnvjitlink-12.8.61 | 28.7 MB | ###7 | 37%  2025-05-07T19:45:58.7140387Z 2025-05-07T19:45:58.7140614Z 2025-05-07T19:45:58.7140622Z 2025-05-07T19:45:58.7140628Z 2025-05-07T19:45:58.7140632Z 2025-05-07T19:45:58.7140637Z 2025-05-07T19:45:58.7140642Z 2025-05-07T19:45:58.7140871Z 2025-05-07T19:45:58.7140880Z 2025-05-07T19:45:58.7140885Z 2025-05-07T19:45:58.7785825Z gds-tools-1.13.0.11 | 37.9 MB | #########6 | 97%  2025-05-07T19:45:58.7786170Z 2025-05-07T19:45:58.7786174Z 2025-05-07T19:45:58.7786178Z 2025-05-07T19:45:58.7786182Z 2025-05-07T19:45:58.7786185Z 2025-05-07T19:45:58.7786189Z 2025-05-07T19:45:58.7786192Z 2025-05-07T19:45:58.7786195Z 2025-05-07T19:45:58.7786199Z 2025-05-07T19:45:58.7786202Z 2025-05-07T19:45:58.7786398Z 2025-05-07T19:45:58.8087848Z libnvjitlink-12.8.61 | 28.7 MB | #####2 | 53%  2025-05-07T19:45:58.8088236Z 2025-05-07T19:45:58.8088241Z 2025-05-07T19:45:58.8088244Z 2025-05-07T19:45:58.8088248Z 2025-05-07T19:45:58.8088251Z 2025-05-07T19:45:58.8088254Z 2025-05-07T19:45:58.8088258Z 2025-05-07T19:45:58.8088261Z 2025-05-07T19:45:58.8492014Z cuda-nvrtc-12.8.61 | 63.1 MB | ########## | 100%  2025-05-07T19:45:58.8492349Z 2025-05-07T19:45:58.8492374Z 2025-05-07T19:45:58.8492377Z 2025-05-07T19:45:58.8492381Z 2025-05-07T19:45:58.8492386Z 2025-05-07T19:45:58.8492389Z 2025-05-07T19:45:58.8492411Z 2025-05-07T19:45:58.8492430Z 2025-05-07T19:45:58.8492434Z 2025-05-07T19:45:58.8492437Z 2025-05-07T19:45:58.8492440Z 2025-05-07T19:45:58.8492444Z 2025-05-07T19:45:58.8813749Z cuda-nvcc-tools-12.8 | 24.5 MB | | 0%  2025-05-07T19:45:58.8814379Z 2025-05-07T19:45:58.8814398Z 2025-05-07T19:45:58.8814403Z 2025-05-07T19:45:58.8814408Z 2025-05-07T19:45:58.8814413Z 2025-05-07T19:45:58.8814416Z 2025-05-07T19:45:58.8814421Z 2025-05-07T19:45:58.8814425Z 2025-05-07T19:45:58.8814431Z 2025-05-07T19:45:58.8814435Z 2025-05-07T19:45:58.8814465Z 2025-05-07T19:45:58.9493458Z libnvjitlink-12.8.61 | 28.7 MB | #######2 | 73%  2025-05-07T19:45:58.9493815Z 2025-05-07T19:45:58.9493820Z 2025-05-07T19:45:58.9493824Z 2025-05-07T19:45:58.9493840Z 2025-05-07T19:45:58.9493843Z 2025-05-07T19:45:58.9493869Z 2025-05-07T19:45:58.9493872Z 2025-05-07T19:45:58.9493876Z 2025-05-07T19:45:58.9493879Z 2025-05-07T19:45:58.9493893Z 2025-05-07T19:45:58.9493896Z 2025-05-07T19:45:58.9493900Z 2025-05-07T19:45:58.9822246Z cuda-nvcc-tools-12.8 | 24.5 MB | ###4 | 34%  2025-05-07T19:45:58.9822628Z 2025-05-07T19:45:58.9822633Z 2025-05-07T19:45:58.9822636Z 2025-05-07T19:45:58.9822640Z 2025-05-07T19:45:58.9822643Z 2025-05-07T19:45:58.9822647Z 2025-05-07T19:45:58.9822651Z 2025-05-07T19:45:58.9822654Z 2025-05-07T19:45:58.9822657Z 2025-05-07T19:45:58.9822661Z 2025-05-07T19:45:58.9823363Z 2025-05-07T19:45:59.0493706Z libnvjitlink-12.8.61 | 28.7 MB | ########9 | 90%  2025-05-07T19:45:59.0494084Z 2025-05-07T19:45:59.0494088Z 2025-05-07T19:45:59.0494092Z 2025-05-07T19:45:59.0494095Z 2025-05-07T19:45:59.0494098Z 2025-05-07T19:45:59.0494102Z 2025-05-07T19:45:59.0494105Z 2025-05-07T19:45:59.0494109Z 2025-05-07T19:45:59.0494129Z 2025-05-07T19:45:59.0494133Z 2025-05-07T19:45:59.0494136Z 2025-05-07T19:45:59.0494140Z 2025-05-07T19:45:59.1611625Z cuda-nvcc-tools-12.8 | 24.5 MB | ######5 | 65%  2025-05-07T19:45:59.1611989Z 2025-05-07T19:45:59.2277137Z nsight-compute-2025. | 320.6 MB | ########## | 100%  2025-05-07T19:45:59.2277450Z 2025-05-07T19:45:59.2277455Z 2025-05-07T19:45:59.2277458Z 2025-05-07T19:45:59.2277462Z 2025-05-07T19:45:59.2277465Z 2025-05-07T19:45:59.2277469Z 2025-05-07T19:45:59.2277473Z 2025-05-07T19:45:59.2277477Z 2025-05-07T19:45:59.2277481Z 2025-05-07T19:45:59.2277484Z 2025-05-07T19:45:59.2277487Z 2025-05-07T19:45:59.2277491Z 2025-05-07T19:45:59.2277494Z 2025-05-07T19:45:59.2500036Z cuda-nvvm-tools-12.8 | 23.5 MB | | 0%  2025-05-07T19:45:59.2500652Z 2025-05-07T19:45:59.2500657Z 2025-05-07T19:45:59.2500755Z 2025-05-07T19:45:59.2500764Z 2025-05-07T19:45:59.2500770Z 2025-05-07T19:45:59.2501006Z 2025-05-07T19:45:59.2694067Z cuda-nsight-12.8.55 | 113.2 MB | ########## | 100%  2025-05-07T19:45:59.2694672Z 2025-05-07T19:45:59.2694678Z 2025-05-07T19:45:59.2694683Z 2025-05-07T19:45:59.2694686Z 2025-05-07T19:45:59.2694690Z 2025-05-07T19:45:59.2694693Z 2025-05-07T19:45:59.2694697Z 2025-05-07T19:45:59.2694700Z 2025-05-07T19:45:59.2694703Z 2025-05-07T19:45:59.2694707Z 2025-05-07T19:45:59.3212237Z gds-tools-1.13.0.11 | 37.9 MB | ########## | 100%  2025-05-07T19:45:59.3212637Z 2025-05-07T19:45:59.3212642Z 2025-05-07T19:45:59.3212647Z 2025-05-07T19:45:59.3212652Z 2025-05-07T19:45:59.3212656Z 2025-05-07T19:45:59.3212660Z 2025-05-07T19:45:59.3212665Z 2025-05-07T19:45:59.3212669Z 2025-05-07T19:45:59.3212674Z 2025-05-07T19:45:59.3212680Z 2025-05-07T19:45:59.3212683Z 2025-05-07T19:45:59.3212686Z 2025-05-07T19:45:59.3212690Z 2025-05-07T19:45:59.3212694Z 2025-05-07T19:45:59.3279951Z cuda-nvvm-impl-12.8. | 20.8 MB | | 0%  2025-05-07T19:45:59.3280333Z 2025-05-07T19:45:59.3280338Z 2025-05-07T19:45:59.3280341Z 2025-05-07T19:45:59.3280356Z 2025-05-07T19:45:59.3280360Z 2025-05-07T19:45:59.3280363Z 2025-05-07T19:45:59.3280367Z 2025-05-07T19:45:59.3280370Z 2025-05-07T19:45:59.3280373Z 2025-05-07T19:45:59.3280377Z 2025-05-07T19:45:59.3280380Z 2025-05-07T19:45:59.3280383Z 2025-05-07T19:45:59.3280387Z 2025-05-07T19:45:59.4212044Z cuda-nvvm-tools-12.8 | 23.5 MB | ###8 | 38%  2025-05-07T19:45:59.4212419Z 2025-05-07T19:45:59.4212424Z 2025-05-07T19:45:59.4212428Z 2025-05-07T19:45:59.4212432Z 2025-05-07T19:45:59.4212435Z 2025-05-07T19:45:59.4212439Z 2025-05-07T19:45:59.4212443Z 2025-05-07T19:45:59.4212447Z 2025-05-07T19:45:59.4212451Z 2025-05-07T19:45:59.4212455Z 2025-05-07T19:45:59.4212459Z 2025-05-07T19:45:59.4212463Z 2025-05-07T19:45:59.4212467Z 2025-05-07T19:45:59.4212471Z 2025-05-07T19:45:59.4280115Z cuda-nvvm-impl-12.8. | 20.8 MB | ###4 | 34%  2025-05-07T19:45:59.4280519Z 2025-05-07T19:45:59.4280536Z 2025-05-07T19:45:59.4280539Z 2025-05-07T19:45:59.4280543Z 2025-05-07T19:45:59.4280546Z 2025-05-07T19:45:59.4280550Z 2025-05-07T19:45:59.4280554Z 2025-05-07T19:45:59.4280557Z 2025-05-07T19:45:59.4280560Z 2025-05-07T19:45:59.4280564Z 2025-05-07T19:45:59.4280567Z 2025-05-07T19:45:59.4280570Z 2025-05-07T19:45:59.4280574Z 2025-05-07T19:45:59.4782294Z cuda-nvvm-tools-12.8 | 23.5 MB | ####### | 70%  2025-05-07T19:45:59.4782663Z 2025-05-07T19:45:59.4782668Z 2025-05-07T19:45:59.4782673Z 2025-05-07T19:45:59.4782677Z 2025-05-07T19:45:59.4782684Z 2025-05-07T19:45:59.4782688Z 2025-05-07T19:45:59.4782692Z 2025-05-07T19:45:59.4782696Z 2025-05-07T19:45:59.4782699Z 2025-05-07T19:45:59.4782702Z 2025-05-07T19:45:59.4782720Z 2025-05-07T19:45:59.4782748Z 2025-05-07T19:45:59.4783225Z cuda-nvcc-tools-12.8 | 24.5 MB | ########## | 100%  2025-05-07T19:45:59.4783578Z 2025-05-07T19:45:59.4783590Z 2025-05-07T19:45:59.4783594Z 2025-05-07T19:45:59.4783607Z 2025-05-07T19:45:59.4783611Z 2025-05-07T19:45:59.4783628Z 2025-05-07T19:45:59.4783631Z 2025-05-07T19:45:59.4783634Z 2025-05-07T19:45:59.4783638Z 2025-05-07T19:45:59.4783641Z 2025-05-07T19:45:59.4783644Z 2025-05-07T19:45:59.4783648Z 2025-05-07T19:45:59.4858228Z cuda-nvcc-tools-12.8 | 24.5 MB | ########## | 100%  2025-05-07T19:45:59.4858613Z 2025-05-07T19:45:59.4858618Z 2025-05-07T19:45:59.4858622Z 2025-05-07T19:45:59.4858625Z 2025-05-07T19:45:59.4858628Z 2025-05-07T19:45:59.4858632Z 2025-05-07T19:45:59.4858635Z 2025-05-07T19:45:59.4858639Z 2025-05-07T19:45:59.4858642Z 2025-05-07T19:45:59.4858645Z 2025-05-07T19:45:59.5210323Z 2025-05-07T19:45:59.5210886Z libnvjitlink-12.8.61 | 28.7 MB | ########## | 100%  2025-05-07T19:45:59.5211233Z 2025-05-07T19:45:59.5211454Z 2025-05-07T19:45:59.5211459Z 2025-05-07T19:45:59.5211463Z 2025-05-07T19:45:59.5211468Z 2025-05-07T19:45:59.5211472Z 2025-05-07T19:45:59.5211589Z 2025-05-07T19:45:59.5211609Z 2025-05-07T19:45:59.5211613Z 2025-05-07T19:45:59.5211616Z 2025-05-07T19:45:59.5211619Z 2025-05-07T19:45:59.5211623Z 2025-05-07T19:45:59.5211626Z 2025-05-07T19:45:59.5211629Z 2025-05-07T19:45:59.5261623Z cuda-nvvm-impl-12.8. | 20.8 MB | #######5 | 75%  2025-05-07T19:45:59.5262009Z 2025-05-07T19:45:59.5262014Z 2025-05-07T19:45:59.5262017Z 2025-05-07T19:45:59.5262021Z 2025-05-07T19:45:59.5262025Z 2025-05-07T19:45:59.5262028Z 2025-05-07T19:45:59.5262032Z 2025-05-07T19:45:59.5262035Z 2025-05-07T19:45:59.5262038Z 2025-05-07T19:45:59.5262042Z 2025-05-07T19:45:59.5262045Z 2025-05-07T19:45:59.5262048Z 2025-05-07T19:45:59.5262051Z 2025-05-07T19:45:59.5262055Z 2025-05-07T19:45:59.5262058Z 2025-05-07T19:45:59.5379167Z cuda-nvcc-dev_linux- | 12.7 MB | | 0%  2025-05-07T19:45:59.5379658Z 2025-05-07T19:45:59.5379662Z 2025-05-07T19:45:59.5379666Z 2025-05-07T19:45:59.5379678Z 2025-05-07T19:45:59.5379682Z 2025-05-07T19:45:59.5379685Z 2025-05-07T19:45:59.5379689Z 2025-05-07T19:45:59.5379692Z 2025-05-07T19:45:59.5379695Z 2025-05-07T19:45:59.5379699Z 2025-05-07T19:45:59.5379702Z 2025-05-07T19:45:59.5379705Z 2025-05-07T19:45:59.5379709Z 2025-05-07T19:45:59.5550392Z cuda-nvvm-tools-12.8 | 23.5 MB | #########7 | 97%  2025-05-07T19:45:59.5550776Z 2025-05-07T19:45:59.5550780Z 2025-05-07T19:45:59.5550784Z 2025-05-07T19:45:59.5550787Z 2025-05-07T19:45:59.5550791Z 2025-05-07T19:45:59.5550794Z 2025-05-07T19:45:59.5550797Z 2025-05-07T19:45:59.5550801Z 2025-05-07T19:45:59.5550804Z 2025-05-07T19:45:59.5550808Z 2025-05-07T19:45:59.5550812Z 2025-05-07T19:45:59.5550816Z 2025-05-07T19:45:59.5550819Z 2025-05-07T19:45:59.5550823Z 2025-05-07T19:45:59.5550826Z 2025-05-07T19:45:59.5553622Z 2025-05-07T19:45:59.6262043Z cuda-sanitizer-api-1 | 8.8 MB | | 0%  2025-05-07T19:45:59.6262434Z 2025-05-07T19:45:59.6262439Z 2025-05-07T19:45:59.6262443Z 2025-05-07T19:45:59.6262446Z 2025-05-07T19:45:59.6262450Z 2025-05-07T19:45:59.6262453Z 2025-05-07T19:45:59.6262457Z 2025-05-07T19:45:59.6262460Z 2025-05-07T19:45:59.6262464Z 2025-05-07T19:45:59.6262467Z 2025-05-07T19:45:59.6262470Z 2025-05-07T19:45:59.6262474Z 2025-05-07T19:45:59.6262490Z 2025-05-07T19:45:59.6262494Z 2025-05-07T19:45:59.6262497Z 2025-05-07T19:45:59.6551286Z cuda-nvcc-dev_linux- | 12.7 MB | ###5 | 35%  2025-05-07T19:45:59.6551669Z 2025-05-07T19:45:59.6551673Z 2025-05-07T19:45:59.6551678Z 2025-05-07T19:45:59.6551682Z 2025-05-07T19:45:59.6551700Z 2025-05-07T19:45:59.6551703Z 2025-05-07T19:45:59.6551706Z 2025-05-07T19:45:59.6551710Z 2025-05-07T19:45:59.6551715Z 2025-05-07T19:45:59.6551718Z 2025-05-07T19:45:59.6551743Z 2025-05-07T19:45:59.6551747Z 2025-05-07T19:45:59.6551750Z 2025-05-07T19:45:59.6551753Z 2025-05-07T19:45:59.6551770Z 2025-05-07T19:45:59.6551773Z 2025-05-07T19:45:59.7262143Z cuda-sanitizer-api-1 | 8.8 MB | #######2 | 73%  2025-05-07T19:45:59.7262552Z 2025-05-07T19:45:59.7262557Z 2025-05-07T19:45:59.7262561Z 2025-05-07T19:45:59.7262568Z 2025-05-07T19:45:59.7262573Z 2025-05-07T19:45:59.7262577Z 2025-05-07T19:45:59.7262580Z 2025-05-07T19:45:59.7262584Z 2025-05-07T19:45:59.7262587Z 2025-05-07T19:45:59.7262590Z 2025-05-07T19:45:59.7262594Z 2025-05-07T19:45:59.7262597Z 2025-05-07T19:45:59.7262600Z 2025-05-07T19:45:59.7262604Z 2025-05-07T19:45:59.7262607Z 2025-05-07T19:45:59.8101813Z cuda-nvcc-dev_linux- | 12.7 MB | #########9 | 100%  2025-05-07T19:45:59.8102195Z 2025-05-07T19:45:59.8102200Z 2025-05-07T19:45:59.8102204Z 2025-05-07T19:45:59.8102208Z 2025-05-07T19:45:59.8102441Z 2025-05-07T19:45:59.8102445Z 2025-05-07T19:45:59.8102448Z 2025-05-07T19:45:59.8102451Z 2025-05-07T19:45:59.8102455Z 2025-05-07T19:45:59.8102574Z 2025-05-07T19:45:59.8102594Z 2025-05-07T19:45:59.8102598Z 2025-05-07T19:45:59.8102601Z 2025-05-07T19:45:59.8102605Z 2025-05-07T19:45:59.8102608Z 2025-05-07T19:45:59.8102611Z 2025-05-07T19:45:59.8536777Z cuda-sanitizer-api-1 | 8.8 MB | ########## | 100%  2025-05-07T19:45:59.8537150Z 2025-05-07T19:45:59.8537154Z 2025-05-07T19:45:59.8537158Z 2025-05-07T19:45:59.8537162Z 2025-05-07T19:45:59.8537165Z 2025-05-07T19:45:59.8537169Z 2025-05-07T19:45:59.8537172Z 2025-05-07T19:45:59.8537176Z 2025-05-07T19:45:59.8537179Z 2025-05-07T19:45:59.8537182Z 2025-05-07T19:45:59.8537186Z 2025-05-07T19:45:59.8537189Z 2025-05-07T19:45:59.8537193Z 2025-05-07T19:45:59.8537196Z 2025-05-07T19:45:59.8560614Z cuda-nvvm-impl-12.8. | 20.8 MB | ########## | 100%  2025-05-07T19:45:59.8560993Z 2025-05-07T19:45:59.8560997Z 2025-05-07T19:45:59.8561104Z 2025-05-07T19:45:59.8561114Z 2025-05-07T19:45:59.8561141Z 2025-05-07T19:45:59.8561146Z 2025-05-07T19:45:59.8561150Z 2025-05-07T19:45:59.8561155Z 2025-05-07T19:45:59.8561161Z 2025-05-07T19:45:59.8561166Z 2025-05-07T19:45:59.8561171Z 2025-05-07T19:45:59.8561175Z 2025-05-07T19:45:59.8561180Z 2025-05-07T19:45:59.8706665Z cuda-nvvm-tools-12.8 | 23.5 MB | ########## | 100%  2025-05-07T19:45:59.8707018Z 2025-05-07T19:45:59.8707036Z 2025-05-07T19:45:59.8707039Z 2025-05-07T19:45:59.8707043Z 2025-05-07T19:45:59.8707046Z 2025-05-07T19:45:59.8707050Z 2025-05-07T19:45:59.8707053Z 2025-05-07T19:45:59.8707056Z 2025-05-07T19:45:59.8707073Z 2025-05-07T19:45:59.8707077Z 2025-05-07T19:45:59.8707080Z 2025-05-07T19:45:59.8707083Z 2025-05-07T19:45:59.8707087Z 2025-05-07T19:45:59.8707090Z 2025-05-07T19:45:59.8707093Z 2025-05-07T19:45:59.8777230Z cuda-nvcc-dev_linux- | 12.7 MB | ########## | 100%  2025-05-07T19:45:59.8777625Z 2025-05-07T19:45:59.8777630Z 2025-05-07T19:45:59.8777642Z 2025-05-07T19:45:59.8777646Z 2025-05-07T19:45:59.8777650Z 2025-05-07T19:45:59.8777654Z 2025-05-07T19:45:59.8777657Z 2025-05-07T19:45:59.8777660Z 2025-05-07T19:45:59.8777664Z 2025-05-07T19:45:59.8777667Z 2025-05-07T19:45:59.8777670Z 2025-05-07T19:45:59.8777674Z 2025-05-07T19:45:59.8777677Z 2025-05-07T19:45:59.8777693Z 2025-05-07T19:45:59.8777696Z 2025-05-07T19:45:59.8777700Z 2025-05-07T19:45:59.8777703Z 2025-05-07T19:45:59.9022317Z cuda-nvdisasm-12.8.5 | 4.9 MB | | 0%  2025-05-07T19:45:59.9022696Z 2025-05-07T19:45:59.9022700Z 2025-05-07T19:45:59.9022704Z 2025-05-07T19:45:59.9022723Z 2025-05-07T19:45:59.9022726Z 2025-05-07T19:45:59.9022730Z 2025-05-07T19:45:59.9022733Z 2025-05-07T19:45:59.9120909Z cuda-nvvp-12.8.57 | 112.4 MB | ########## | 100%  2025-05-07T19:45:59.9121262Z 2025-05-07T19:45:59.9121266Z 2025-05-07T19:45:59.9121270Z 2025-05-07T19:45:59.9121288Z 2025-05-07T19:45:59.9121298Z 2025-05-07T19:45:59.9121302Z 2025-05-07T19:45:59.9121305Z 2025-05-07T19:45:59.9121309Z 2025-05-07T19:45:59.9121312Z 2025-05-07T19:45:59.9121315Z 2025-05-07T19:45:59.9121319Z 2025-05-07T19:45:59.9121322Z 2025-05-07T19:45:59.9121326Z 2025-05-07T19:45:59.9121329Z 2025-05-07T19:45:59.9121332Z 2025-05-07T19:45:59.9121336Z 2025-05-07T19:45:59.9121339Z 2025-05-07T19:45:59.9121343Z 2025-05-07T19:45:59.9121346Z 2025-05-07T19:45:59.9137852Z ... (more hidden) ... 2025-05-07T19:45:59.9138211Z 2025-05-07T19:45:59.9138320Z 2025-05-07T19:45:59.9138323Z 2025-05-07T19:45:59.9138327Z 2025-05-07T19:45:59.9138330Z 2025-05-07T19:45:59.9138333Z 2025-05-07T19:45:59.9138337Z 2025-05-07T19:45:59.9138340Z 2025-05-07T19:45:59.9138344Z 2025-05-07T19:45:59.9138361Z 2025-05-07T19:45:59.9138364Z 2025-05-07T19:45:59.9138569Z 2025-05-07T19:45:59.9138574Z 2025-05-07T19:45:59.9138577Z 2025-05-07T19:45:59.9138581Z 2025-05-07T19:45:59.9138584Z 2025-05-07T19:45:59.9138683Z 2025-05-07T19:45:59.9138692Z 2025-05-07T19:45:59.9927642Z cuda-cupti-dev-12.8. | 4.0 MB | | 0%  2025-05-07T19:45:59.9928042Z 2025-05-07T19:45:59.9928047Z 2025-05-07T19:45:59.9928051Z 2025-05-07T19:45:59.9928055Z 2025-05-07T19:45:59.9928060Z 2025-05-07T19:45:59.9928065Z 2025-05-07T19:45:59.9928069Z 2025-05-07T19:45:59.9928075Z 2025-05-07T19:45:59.9928079Z 2025-05-07T19:45:59.9928083Z 2025-05-07T19:45:59.9928087Z 2025-05-07T19:45:59.9928091Z 2025-05-07T19:45:59.9928094Z 2025-05-07T19:45:59.9928098Z 2025-05-07T19:45:59.9928101Z 2025-05-07T19:45:59.9928105Z 2025-05-07T19:45:59.9928108Z 2025-05-07T19:45:59.9928460Z cuda-nvdisasm-12.8.5 | 4.9 MB | ########## | 100%  2025-05-07T19:45:59.9928805Z 2025-05-07T19:45:59.9928826Z 2025-05-07T19:45:59.9928830Z 2025-05-07T19:45:59.9928833Z 2025-05-07T19:45:59.9928836Z 2025-05-07T19:45:59.9928840Z 2025-05-07T19:45:59.9928853Z 2025-05-07T19:45:59.9928856Z 2025-05-07T19:45:59.9928860Z 2025-05-07T19:45:59.9928863Z 2025-05-07T19:45:59.9928867Z 2025-05-07T19:45:59.9928870Z 2025-05-07T19:45:59.9928887Z 2025-05-07T19:45:59.9928891Z 2025-05-07T19:45:59.9928894Z 2025-05-07T19:45:59.9928897Z 2025-05-07T19:45:59.9928900Z 2025-05-07T19:46:00.0152091Z cuda-nvdisasm-12.8.5 | 4.9 MB | ########## | 100%  2025-05-07T19:46:00.0152572Z 2025-05-07T19:46:00.0152641Z 2025-05-07T19:46:00.0152655Z 2025-05-07T19:46:00.0152659Z 2025-05-07T19:46:00.0152675Z 2025-05-07T19:46:00.0152679Z 2025-05-07T19:46:00.0152683Z 2025-05-07T19:46:00.0152728Z 2025-05-07T19:46:00.0152733Z 2025-05-07T19:46:00.0152746Z 2025-05-07T19:46:00.0152750Z 2025-05-07T19:46:00.0152764Z 2025-05-07T19:46:00.0152767Z 2025-05-07T19:46:00.0152809Z 2025-05-07T19:46:00.0152826Z 2025-05-07T19:46:00.0152830Z 2025-05-07T19:46:00.0152834Z 2025-05-07T19:46:00.0152838Z 2025-05-07T19:46:00.0152925Z 2025-05-07T19:46:00.0343981Z ... (more hidden) ... 2025-05-07T19:46:00.0344337Z 2025-05-07T19:46:00.0344342Z 2025-05-07T19:46:00.0344345Z 2025-05-07T19:46:00.0344348Z 2025-05-07T19:46:00.0344352Z 2025-05-07T19:46:00.0344355Z 2025-05-07T19:46:00.0344359Z 2025-05-07T19:46:00.0344363Z 2025-05-07T19:46:00.0344366Z 2025-05-07T19:46:00.0344370Z 2025-05-07T19:46:00.0344374Z 2025-05-07T19:46:00.0344377Z 2025-05-07T19:46:00.0344381Z 2025-05-07T19:46:00.0344384Z 2025-05-07T19:46:00.0344387Z 2025-05-07T19:46:00.0344391Z 2025-05-07T19:46:00.0344394Z 2025-05-07T19:46:00.0344410Z 2025-05-07T19:46:00.0344760Z cuda-cupti-dev-12.8. | 4.0 MB | ########## | 100%  2025-05-07T19:46:00.0345110Z 2025-05-07T19:46:00.0345115Z 2025-05-07T19:46:00.0345118Z 2025-05-07T19:46:00.0345129Z 2025-05-07T19:46:00.0345132Z 2025-05-07T19:46:00.0345136Z 2025-05-07T19:46:00.0345139Z 2025-05-07T19:46:00.0345148Z 2025-05-07T19:46:00.0345164Z 2025-05-07T19:46:00.0345167Z 2025-05-07T19:46:00.0345171Z 2025-05-07T19:46:00.0345174Z 2025-05-07T19:46:00.0345177Z 2025-05-07T19:46:00.0345181Z 2025-05-07T19:46:00.0345184Z 2025-05-07T19:46:00.0345188Z 2025-05-07T19:46:00.0345191Z 2025-05-07T19:46:00.0345194Z 2025-05-07T19:46:00.0482164Z cuda-cupti-dev-12.8. | 4.0 MB | ########## | 100%  2025-05-07T19:46:00.0482561Z 2025-05-07T19:46:00.0482565Z 2025-05-07T19:46:00.0482569Z 2025-05-07T19:46:00.0482573Z 2025-05-07T19:46:00.0482576Z 2025-05-07T19:46:00.0482580Z 2025-05-07T19:46:00.0482583Z 2025-05-07T19:46:00.0482587Z 2025-05-07T19:46:00.0482591Z 2025-05-07T19:46:00.0482594Z 2025-05-07T19:46:00.0482597Z 2025-05-07T19:46:00.0482601Z 2025-05-07T19:46:00.0482604Z 2025-05-07T19:46:00.0482607Z 2025-05-07T19:46:00.0482852Z 2025-05-07T19:46:00.0482856Z 2025-05-07T19:46:00.0482860Z 2025-05-07T19:46:00.0482863Z 2025-05-07T19:46:00.0482866Z 2025-05-07T19:46:00.1781962Z ... (more hidden) ... 2025-05-07T19:46:00.2599233Z libcublas-12.8.3.14 | 460.2 MB | ########## | 100% 2025-05-07T19:46:00.2599562Z 2025-05-07T19:46:00.2599567Z 2025-05-07T19:46:00.2599572Z 2025-05-07T19:46:00.2599576Z 2025-05-07T19:46:00.2599581Z 2025-05-07T19:46:00.2599586Z 2025-05-07T19:46:00.2599591Z 2025-05-07T19:46:00.2599595Z 2025-05-07T19:46:00.2599599Z 2025-05-07T19:46:00.2679898Z libcurand-10.3.9.55 | 43.6 MB | ########## | 100%  2025-05-07T19:46:00.2680237Z 2025-05-07T19:46:00.2680242Z 2025-05-07T19:46:00.2680280Z 2025-05-07T19:46:00.2680285Z 2025-05-07T19:46:00.2680290Z 2025-05-07T19:46:00.5075790Z libnpp-12.3.3.65 | 130.6 MB | ########## | 100%  2025-05-07T19:46:00.5076143Z 2025-05-07T19:46:00.5076149Z 2025-05-07T19:46:00.5076179Z 2025-05-07T19:46:00.5076184Z 2025-05-07T19:46:00.5076187Z 2025-05-07T19:46:00.5076191Z 2025-05-07T19:46:00.5076195Z 2025-05-07T19:46:00.5076218Z 2025-05-07T19:46:00.5076222Z 2025-05-07T19:46:00.5076225Z 2025-05-07T19:46:00.9301248Z gds-tools-1.13.0.11 | 37.9 MB | ########## | 100%  2025-05-07T19:46:00.9301638Z 2025-05-07T19:46:00.9301643Z 2025-05-07T19:46:00.9301647Z 2025-05-07T19:46:00.9301652Z 2025-05-07T19:46:00.9301656Z 2025-05-07T19:46:00.9301661Z 2025-05-07T19:46:00.9301665Z 2025-05-07T19:46:00.9301670Z 2025-05-07T19:46:00.9301677Z 2025-05-07T19:46:00.9301681Z 2025-05-07T19:46:00.9301685Z 2025-05-07T19:46:00.9301688Z 2025-05-07T19:46:01.0565705Z cuda-nvcc-tools-12.8 | 24.5 MB | ########## | 100%  2025-05-07T19:46:01.0566086Z 2025-05-07T19:46:01.0566107Z 2025-05-07T19:46:01.0566112Z 2025-05-07T19:46:01.0566117Z 2025-05-07T19:46:01.0566121Z 2025-05-07T19:46:01.0566126Z 2025-05-07T19:46:01.0566131Z 2025-05-07T19:46:01.0566166Z 2025-05-07T19:46:01.2166138Z cuda-nvrtc-12.8.61 | 63.1 MB | ########## | 100%  2025-05-07T19:46:01.2166548Z 2025-05-07T19:46:01.2166554Z 2025-05-07T19:46:01.2166558Z 2025-05-07T19:46:01.2166562Z 2025-05-07T19:46:01.2166566Z 2025-05-07T19:46:01.2166573Z 2025-05-07T19:46:01.2166577Z 2025-05-07T19:46:01.2166581Z 2025-05-07T19:46:01.2166585Z 2025-05-07T19:46:01.2166588Z 2025-05-07T19:46:01.2166591Z 2025-05-07T19:46:01.2166612Z 2025-05-07T19:46:01.2166615Z 2025-05-07T19:46:01.2166619Z 2025-05-07T19:46:01.2166622Z 2025-05-07T19:46:01.2166625Z 2025-05-07T19:46:01.3298774Z cuda-sanitizer-api-1 | 8.8 MB | ########## | 100%  2025-05-07T19:46:01.3299266Z 2025-05-07T19:46:01.3299271Z 2025-05-07T19:46:01.3299276Z 2025-05-07T19:46:01.3299281Z 2025-05-07T19:46:01.3299284Z 2025-05-07T19:46:01.3299289Z 2025-05-07T19:46:01.3299312Z 2025-05-07T19:46:01.3299317Z 2025-05-07T19:46:01.3299322Z 2025-05-07T19:46:01.3299354Z 2025-05-07T19:46:01.3299357Z 2025-05-07T19:46:01.4707502Z libnvjitlink-12.8.61 | 28.7 MB | ########## | 100%  2025-05-07T19:46:01.4707897Z 2025-05-07T19:46:01.4707902Z 2025-05-07T19:46:01.4707906Z 2025-05-07T19:46:01.4707910Z 2025-05-07T19:46:01.4707913Z 2025-05-07T19:46:01.4707916Z 2025-05-07T19:46:01.4707920Z 2025-05-07T19:46:01.4707923Z 2025-05-07T19:46:01.4707926Z 2025-05-07T19:46:01.4707930Z 2025-05-07T19:46:01.4707933Z 2025-05-07T19:46:01.4707936Z 2025-05-07T19:46:01.4707940Z 2025-05-07T19:46:01.4707943Z 2025-05-07T19:46:01.6434150Z cuda-nvvm-impl-12.8. | 20.8 MB | ########## | 100%  2025-05-07T19:46:01.6434556Z 2025-05-07T19:46:01.6434562Z 2025-05-07T19:46:01.6434567Z 2025-05-07T19:46:01.6434571Z 2025-05-07T19:46:01.6434575Z 2025-05-07T19:46:01.6434579Z 2025-05-07T19:46:01.6434583Z 2025-05-07T19:46:01.6434587Z 2025-05-07T19:46:01.6434592Z 2025-05-07T19:46:01.6434598Z 2025-05-07T19:46:01.6434861Z 2025-05-07T19:46:01.6434865Z 2025-05-07T19:46:01.6434868Z 2025-05-07T19:46:01.7261198Z cuda-nvvm-tools-12.8 | 23.5 MB | ########## | 100%  2025-05-07T19:46:01.7270452Z 2025-05-07T19:46:01.7270457Z 2025-05-07T19:46:01.7270461Z 2025-05-07T19:46:01.7270466Z 2025-05-07T19:46:01.7270471Z 2025-05-07T19:46:01.7270476Z 2025-05-07T19:46:01.7270480Z 2025-05-07T19:46:01.7270502Z 2025-05-07T19:46:01.7270506Z 2025-05-07T19:46:01.7270511Z 2025-05-07T19:46:01.7270515Z 2025-05-07T19:46:01.7270519Z 2025-05-07T19:46:01.7270524Z 2025-05-07T19:46:01.7270528Z 2025-05-07T19:46:01.7270532Z 2025-05-07T19:46:01.7270537Z 2025-05-07T19:46:01.7270541Z 2025-05-07T19:46:01.7270924Z cuda-nvdisasm-12.8.5 | 4.9 MB | ########## | 100%  2025-05-07T19:46:01.7271286Z 2025-05-07T19:46:01.7271289Z 2025-05-07T19:46:01.7271293Z 2025-05-07T19:46:01.7271297Z 2025-05-07T19:46:01.7271300Z 2025-05-07T19:46:01.7271304Z 2025-05-07T19:46:01.7271319Z 2025-05-07T19:46:01.7271322Z 2025-05-07T19:46:01.7271325Z 2025-05-07T19:46:01.7271329Z 2025-05-07T19:46:01.7271336Z 2025-05-07T19:46:01.7271340Z 2025-05-07T19:46:01.7271344Z 2025-05-07T19:46:01.7271347Z 2025-05-07T19:46:01.7272571Z 2025-05-07T19:46:01.7737389Z cuda-nvcc-dev_linux- | 12.7 MB | ########## | 100%  2025-05-07T19:46:01.7737784Z 2025-05-07T19:46:01.7737788Z 2025-05-07T19:46:01.7737792Z 2025-05-07T19:46:01.7737795Z 2025-05-07T19:46:01.7737798Z 2025-05-07T19:46:01.7737802Z 2025-05-07T19:46:01.7737805Z 2025-05-07T19:46:01.7737808Z 2025-05-07T19:46:01.7737812Z 2025-05-07T19:46:01.7737815Z 2025-05-07T19:46:01.7737818Z 2025-05-07T19:46:01.7737822Z 2025-05-07T19:46:01.7737825Z 2025-05-07T19:46:01.7737829Z 2025-05-07T19:46:01.7737847Z 2025-05-07T19:46:01.7737851Z 2025-05-07T19:46:01.7737854Z 2025-05-07T19:46:01.7737858Z 2025-05-07T19:46:01.7737861Z 2025-05-07T19:46:01.8966523Z ... (more hidden) ... 2025-05-07T19:46:01.8966950Z 2025-05-07T19:46:01.8967187Z 2025-05-07T19:46:01.8967225Z 2025-05-07T19:46:01.8967231Z 2025-05-07T19:46:01.8967237Z 2025-05-07T19:46:01.8967243Z 2025-05-07T19:46:01.8967291Z 2025-05-07T19:46:01.8967296Z 2025-05-07T19:46:01.8967300Z 2025-05-07T19:46:01.8967305Z 2025-05-07T19:46:01.8967309Z 2025-05-07T19:46:01.8967314Z 2025-05-07T19:46:01.8967318Z 2025-05-07T19:46:01.8967322Z 2025-05-07T19:46:01.8967327Z 2025-05-07T19:46:01.8967330Z 2025-05-07T19:46:01.8967335Z 2025-05-07T19:46:01.8967338Z 2025-05-07T19:46:05.2528112Z cuda-cupti-dev-12.8. | 4.0 MB | ########## | 100%  2025-05-07T19:46:05.4549762Z libcublas-12.8.3.14 | 460.2 MB | ########## | 100% 2025-05-07T19:46:05.4550066Z 2025-05-07T19:46:05.4564566Z nsight-compute-2025. | 320.6 MB | ########## | 100%  2025-05-07T19:46:05.4564911Z 2025-05-07T19:46:05.4565153Z 2025-05-07T19:46:05.4565165Z 2025-05-07T19:46:05.4565202Z 2025-05-07T19:46:05.4565208Z 2025-05-07T19:46:05.4565212Z 2025-05-07T19:46:05.4565216Z 2025-05-07T19:46:05.4565242Z 2025-05-07T19:46:05.4565246Z 2025-05-07T19:46:05.4565250Z 2025-05-07T19:46:05.4565253Z 2025-05-07T19:46:05.4565256Z 2025-05-07T19:46:05.4565260Z 2025-05-07T19:46:05.4565263Z 2025-05-07T19:46:05.4565266Z 2025-05-07T19:46:05.4565269Z 2025-05-07T19:46:05.4565273Z 2025-05-07T19:46:05.4565276Z 2025-05-07T19:46:05.4565279Z 2025-05-07T19:46:05.4565643Z 2025-05-07T19:46:05.4566210Z  2025-05-07T19:46:05.4566599Z 2025-05-07T19:46:05.4566819Z 2025-05-07T19:46:05.4566993Z  2025-05-07T19:46:05.4567230Z 2025-05-07T19:46:05.4567234Z 2025-05-07T19:46:05.4567410Z  2025-05-07T19:46:05.4567630Z 2025-05-07T19:46:05.4567713Z 2025-05-07T19:46:05.4568008Z 2025-05-07T19:46:05.4568207Z  2025-05-07T19:46:05.4568589Z 2025-05-07T19:46:05.4568594Z 2025-05-07T19:46:05.4568618Z 2025-05-07T19:46:05.4568621Z 2025-05-07T19:46:05.4568813Z  2025-05-07T19:46:05.4569040Z 2025-05-07T19:46:05.4569044Z 2025-05-07T19:46:05.4569048Z 2025-05-07T19:46:05.4569051Z 2025-05-07T19:46:05.4569054Z 2025-05-07T19:46:05.4569257Z  2025-05-07T19:46:05.4569487Z 2025-05-07T19:46:05.4569491Z 2025-05-07T19:46:05.4569494Z 2025-05-07T19:46:05.4569498Z 2025-05-07T19:46:05.4569503Z 2025-05-07T19:46:05.4569507Z 2025-05-07T19:46:05.4569694Z  2025-05-07T19:46:05.4569939Z 2025-05-07T19:46:05.4569943Z 2025-05-07T19:46:05.4569947Z 2025-05-07T19:46:05.4569950Z 2025-05-07T19:46:05.4569953Z 2025-05-07T19:46:05.4569962Z 2025-05-07T19:46:05.4569965Z 2025-05-07T19:46:05.4570159Z  2025-05-07T19:46:05.4570407Z 2025-05-07T19:46:05.4570411Z 2025-05-07T19:46:05.4570415Z 2025-05-07T19:46:05.4570418Z 2025-05-07T19:46:05.4570422Z 2025-05-07T19:46:05.4570425Z 2025-05-07T19:46:05.4570428Z 2025-05-07T19:46:05.4570432Z 2025-05-07T19:46:05.4570622Z  2025-05-07T19:46:05.4570871Z 2025-05-07T19:46:05.4570874Z 2025-05-07T19:46:05.4570878Z 2025-05-07T19:46:05.4570882Z 2025-05-07T19:46:05.4570886Z 2025-05-07T19:46:05.4570889Z 2025-05-07T19:46:05.4570892Z 2025-05-07T19:46:05.4570895Z 2025-05-07T19:46:05.4570899Z 2025-05-07T19:46:05.4571098Z  2025-05-07T19:46:05.4571350Z 2025-05-07T19:46:05.4571354Z 2025-05-07T19:46:05.4571357Z 2025-05-07T19:46:05.4571361Z 2025-05-07T19:46:05.4571368Z 2025-05-07T19:46:05.4571371Z 2025-05-07T19:46:05.4571375Z 2025-05-07T19:46:05.4571378Z 2025-05-07T19:46:05.4571381Z 2025-05-07T19:46:05.4571388Z 2025-05-07T19:46:05.4571589Z  2025-05-07T19:46:05.4571848Z 2025-05-07T19:46:05.4571851Z 2025-05-07T19:46:05.4571855Z 2025-05-07T19:46:05.4571858Z 2025-05-07T19:46:05.4571862Z 2025-05-07T19:46:05.4571865Z 2025-05-07T19:46:05.4571869Z 2025-05-07T19:46:05.4571872Z 2025-05-07T19:46:05.4571876Z 2025-05-07T19:46:05.4571879Z 2025-05-07T19:46:05.4571882Z 2025-05-07T19:46:05.4572090Z  2025-05-07T19:46:05.4572354Z 2025-05-07T19:46:05.4572357Z 2025-05-07T19:46:05.4572361Z 2025-05-07T19:46:05.4572364Z 2025-05-07T19:46:05.4572368Z 2025-05-07T19:46:05.4572371Z 2025-05-07T19:46:05.4572374Z 2025-05-07T19:46:05.4572377Z 2025-05-07T19:46:05.4572381Z 2025-05-07T19:46:05.4572384Z 2025-05-07T19:46:05.4572391Z 2025-05-07T19:46:05.4572395Z 2025-05-07T19:46:05.4572607Z  2025-05-07T19:46:05.4572872Z 2025-05-07T19:46:05.4572876Z 2025-05-07T19:46:05.4572879Z 2025-05-07T19:46:05.4572883Z 2025-05-07T19:46:05.4572886Z 2025-05-07T19:46:05.4572890Z 2025-05-07T19:46:05.4572893Z 2025-05-07T19:46:05.4572896Z 2025-05-07T19:46:05.4572900Z 2025-05-07T19:46:05.4572903Z 2025-05-07T19:46:05.4572907Z 2025-05-07T19:46:05.4572910Z 2025-05-07T19:46:05.4572914Z 2025-05-07T19:46:05.4573126Z  2025-05-07T19:46:05.4573393Z 2025-05-07T19:46:05.4573396Z 2025-05-07T19:46:05.4573400Z 2025-05-07T19:46:05.4573403Z 2025-05-07T19:46:05.4573407Z 2025-05-07T19:46:05.4573410Z 2025-05-07T19:46:05.4573413Z 2025-05-07T19:46:05.4573416Z 2025-05-07T19:46:05.4573420Z 2025-05-07T19:46:05.4573423Z 2025-05-07T19:46:05.4573426Z 2025-05-07T19:46:05.4573549Z 2025-05-07T19:46:05.4573552Z 2025-05-07T19:46:05.4573556Z 2025-05-07T19:46:05.4573861Z  2025-05-07T19:46:05.4574113Z 2025-05-07T19:46:05.4574116Z 2025-05-07T19:46:05.4574120Z 2025-05-07T19:46:05.4574124Z 2025-05-07T19:46:05.4574127Z 2025-05-07T19:46:05.4574130Z 2025-05-07T19:46:05.4574133Z 2025-05-07T19:46:05.4574137Z 2025-05-07T19:46:05.4574140Z 2025-05-07T19:46:05.4574144Z 2025-05-07T19:46:05.4574147Z 2025-05-07T19:46:05.4574150Z 2025-05-07T19:46:05.4574170Z 2025-05-07T19:46:05.4574173Z 2025-05-07T19:46:05.4574177Z 2025-05-07T19:46:05.4574398Z  2025-05-07T19:46:05.4574657Z 2025-05-07T19:46:05.4574660Z 2025-05-07T19:46:05.4574664Z 2025-05-07T19:46:05.4574667Z 2025-05-07T19:46:05.4574671Z 2025-05-07T19:46:05.4574674Z 2025-05-07T19:46:05.4574677Z 2025-05-07T19:46:05.4574698Z 2025-05-07T19:46:05.4574706Z 2025-05-07T19:46:05.4574709Z 2025-05-07T19:46:05.4574713Z 2025-05-07T19:46:05.4574717Z 2025-05-07T19:46:05.4574723Z 2025-05-07T19:46:05.4574727Z 2025-05-07T19:46:05.4574730Z 2025-05-07T19:46:05.4574734Z 2025-05-07T19:46:05.4574957Z  2025-05-07T19:46:05.4575231Z 2025-05-07T19:46:05.4575235Z 2025-05-07T19:46:05.4575238Z 2025-05-07T19:46:05.4575241Z 2025-05-07T19:46:05.4575245Z 2025-05-07T19:46:05.4575248Z 2025-05-07T19:46:05.4575252Z 2025-05-07T19:46:05.4575255Z 2025-05-07T19:46:05.4575258Z 2025-05-07T19:46:05.4575262Z 2025-05-07T19:46:05.4575265Z 2025-05-07T19:46:05.4575268Z 2025-05-07T19:46:05.4575272Z 2025-05-07T19:46:05.4575275Z 2025-05-07T19:46:05.4575278Z 2025-05-07T19:46:05.4575282Z 2025-05-07T19:46:05.4575285Z 2025-05-07T19:46:05.4575515Z  2025-05-07T19:46:05.4575794Z 2025-05-07T19:46:05.4575797Z 2025-05-07T19:46:05.4575801Z 2025-05-07T19:46:05.4575804Z 2025-05-07T19:46:05.4575812Z 2025-05-07T19:46:05.4575816Z 2025-05-07T19:46:05.4575820Z 2025-05-07T19:46:05.4575824Z 2025-05-07T19:46:05.4575827Z 2025-05-07T19:46:05.4575830Z 2025-05-07T19:46:05.4575833Z 2025-05-07T19:46:05.4575837Z 2025-05-07T19:46:05.4575840Z 2025-05-07T19:46:05.4575843Z 2025-05-07T19:46:05.4575846Z 2025-05-07T19:46:05.4575850Z 2025-05-07T19:46:05.4575853Z 2025-05-07T19:46:05.4575874Z 2025-05-07T19:46:05.4576163Z  2025-05-07T19:46:05.4576445Z 2025-05-07T19:46:05.4576448Z 2025-05-07T19:46:05.4576552Z  2025-05-07T19:46:05.4576662Z 2025-05-07T19:46:05.4576665Z 2025-05-07T19:46:05.4576790Z  2025-05-07T19:46:05.4576909Z 2025-05-07T19:46:05.4576913Z 2025-05-07T19:46:05.4576916Z 2025-05-07T19:46:05.4577024Z  2025-05-07T19:46:05.4577159Z 2025-05-07T19:46:05.4577166Z 2025-05-07T19:46:05.4577170Z 2025-05-07T19:46:05.4577173Z 2025-05-07T19:46:05.4577283Z  2025-05-07T19:46:05.4577413Z 2025-05-07T19:46:05.4577417Z 2025-05-07T19:46:05.4577420Z 2025-05-07T19:46:05.4577424Z 2025-05-07T19:46:05.4577427Z 2025-05-07T19:46:05.4577558Z  2025-05-07T19:46:05.4577690Z 2025-05-07T19:46:05.4577694Z 2025-05-07T19:46:05.4577697Z 2025-05-07T19:46:05.4577701Z 2025-05-07T19:46:05.4577705Z 2025-05-07T19:46:05.4577708Z 2025-05-07T19:46:05.4577846Z  2025-05-07T19:46:05.4577985Z 2025-05-07T19:46:05.4577988Z 2025-05-07T19:46:05.4577991Z 2025-05-07T19:46:05.4577995Z 2025-05-07T19:46:05.4577998Z 2025-05-07T19:46:05.4578001Z 2025-05-07T19:46:05.4578005Z 2025-05-07T19:46:05.4578127Z  2025-05-07T19:46:05.4578298Z 2025-05-07T19:46:05.4578301Z 2025-05-07T19:46:05.4578305Z 2025-05-07T19:46:05.4578308Z 2025-05-07T19:46:05.4578312Z 2025-05-07T19:46:05.4578315Z 2025-05-07T19:46:05.4578319Z 2025-05-07T19:46:05.4578388Z 2025-05-07T19:46:05.4578518Z  2025-05-07T19:46:05.4578697Z 2025-05-07T19:46:05.4578701Z 2025-05-07T19:46:05.4578766Z 2025-05-07T19:46:05.4578771Z 2025-05-07T19:46:05.4578774Z 2025-05-07T19:46:05.4578777Z 2025-05-07T19:46:05.4578781Z 2025-05-07T19:46:05.4578784Z 2025-05-07T19:46:05.4578787Z 2025-05-07T19:46:05.4578916Z  2025-05-07T19:46:05.4579084Z 2025-05-07T19:46:05.4579087Z 2025-05-07T19:46:05.4579109Z 2025-05-07T19:46:05.4579113Z 2025-05-07T19:46:05.4579116Z 2025-05-07T19:46:05.4579119Z 2025-05-07T19:46:05.4579123Z 2025-05-07T19:46:05.4579126Z 2025-05-07T19:46:05.4579129Z 2025-05-07T19:46:05.4579132Z 2025-05-07T19:46:05.4579290Z  2025-05-07T19:46:05.4579633Z 2025-05-07T19:46:05.4579637Z 2025-05-07T19:46:05.4579660Z 2025-05-07T19:46:05.4579663Z 2025-05-07T19:46:05.4579667Z 2025-05-07T19:46:05.4579670Z 2025-05-07T19:46:05.4579673Z 2025-05-07T19:46:05.4579676Z 2025-05-07T19:46:05.4579686Z 2025-05-07T19:46:05.4579689Z 2025-05-07T19:46:05.4579693Z 2025-05-07T19:46:05.4579837Z  2025-05-07T19:46:05.4580031Z 2025-05-07T19:46:05.4580035Z 2025-05-07T19:46:05.4580058Z 2025-05-07T19:46:05.4580062Z 2025-05-07T19:46:05.4580065Z 2025-05-07T19:46:05.4580069Z 2025-05-07T19:46:05.4580073Z 2025-05-07T19:46:05.4580076Z 2025-05-07T19:46:05.4580079Z 2025-05-07T19:46:05.4580083Z 2025-05-07T19:46:05.4580086Z 2025-05-07T19:46:05.4580089Z 2025-05-07T19:46:05.4580313Z  2025-05-07T19:46:05.4580508Z 2025-05-07T19:46:05.4580531Z 2025-05-07T19:46:05.4580534Z 2025-05-07T19:46:05.4580538Z 2025-05-07T19:46:05.4580541Z 2025-05-07T19:46:05.4580544Z 2025-05-07T19:46:05.4580547Z 2025-05-07T19:46:05.4580551Z 2025-05-07T19:46:05.4580554Z 2025-05-07T19:46:05.4580558Z 2025-05-07T19:46:05.4580561Z 2025-05-07T19:46:05.4580564Z 2025-05-07T19:46:05.4580567Z 2025-05-07T19:46:05.4580719Z  2025-05-07T19:46:05.4580947Z 2025-05-07T19:46:05.4580951Z 2025-05-07T19:46:05.4580954Z 2025-05-07T19:46:05.4580957Z 2025-05-07T19:46:05.4580965Z 2025-05-07T19:46:05.4580969Z 2025-05-07T19:46:05.4580972Z 2025-05-07T19:46:05.4580976Z 2025-05-07T19:46:05.4580979Z 2025-05-07T19:46:05.4580983Z 2025-05-07T19:46:05.4580986Z 2025-05-07T19:46:05.4580990Z 2025-05-07T19:46:05.4580993Z 2025-05-07T19:46:05.4580996Z 2025-05-07T19:46:05.4581146Z  2025-05-07T19:46:05.4581371Z 2025-05-07T19:46:05.4581375Z 2025-05-07T19:46:05.4581378Z 2025-05-07T19:46:05.4581382Z 2025-05-07T19:46:05.4581385Z 2025-05-07T19:46:05.4581389Z 2025-05-07T19:46:05.4581392Z 2025-05-07T19:46:05.4581395Z 2025-05-07T19:46:05.4581399Z 2025-05-07T19:46:05.4581402Z 2025-05-07T19:46:05.4581405Z 2025-05-07T19:46:05.4581409Z 2025-05-07T19:46:05.4581412Z 2025-05-07T19:46:05.4581415Z 2025-05-07T19:46:05.4581419Z 2025-05-07T19:46:05.4581590Z  2025-05-07T19:46:05.4581809Z 2025-05-07T19:46:05.4581813Z 2025-05-07T19:46:05.4581816Z 2025-05-07T19:46:05.4581819Z 2025-05-07T19:46:05.4581823Z 2025-05-07T19:46:05.4581830Z 2025-05-07T19:46:05.4581833Z 2025-05-07T19:46:05.4581837Z 2025-05-07T19:46:05.4581840Z 2025-05-07T19:46:05.4581843Z 2025-05-07T19:46:05.4581847Z 2025-05-07T19:46:05.4581851Z 2025-05-07T19:46:05.4581854Z 2025-05-07T19:46:05.4581858Z 2025-05-07T19:46:05.4581861Z 2025-05-07T19:46:05.4581881Z 2025-05-07T19:46:05.4582044Z  2025-05-07T19:46:05.4582265Z 2025-05-07T19:46:05.4582268Z 2025-05-07T19:46:05.4582272Z 2025-05-07T19:46:05.4582275Z 2025-05-07T19:46:05.4582278Z 2025-05-07T19:46:05.4582282Z 2025-05-07T19:46:05.4582286Z 2025-05-07T19:46:05.4582289Z 2025-05-07T19:46:05.4582292Z 2025-05-07T19:46:05.4582313Z 2025-05-07T19:46:05.4582316Z 2025-05-07T19:46:05.4582319Z 2025-05-07T19:46:05.4582322Z 2025-05-07T19:46:05.4582326Z 2025-05-07T19:46:05.4582329Z 2025-05-07T19:46:05.4582332Z 2025-05-07T19:46:05.4582403Z 2025-05-07T19:46:05.4582576Z  2025-05-07T19:46:05.4582876Z 2025-05-07T19:46:05.4582880Z 2025-05-07T19:46:05.4582884Z 2025-05-07T19:46:05.4582904Z 2025-05-07T19:46:05.4582908Z 2025-05-07T19:46:05.4582911Z 2025-05-07T19:46:05.4582914Z 2025-05-07T19:46:05.4582917Z 2025-05-07T19:46:05.4582921Z 2025-05-07T19:46:05.4582924Z 2025-05-07T19:46:05.4582927Z 2025-05-07T19:46:05.4582931Z 2025-05-07T19:46:05.4582934Z 2025-05-07T19:46:05.4582937Z 2025-05-07T19:46:05.4582941Z 2025-05-07T19:46:05.4582944Z 2025-05-07T19:46:05.4582947Z 2025-05-07T19:46:05.4582950Z 2025-05-07T19:46:05.4583125Z  2025-05-07T19:46:05.4583374Z 2025-05-07T19:46:05.4583378Z 2025-05-07T19:46:05.4583481Z  2025-05-07T19:46:05.4583595Z 2025-05-07T19:46:05.4583599Z 2025-05-07T19:46:05.4583724Z  2025-05-07T19:46:05.4583839Z 2025-05-07T19:46:05.4583842Z 2025-05-07T19:46:05.4583846Z 2025-05-07T19:46:05.4583957Z  2025-05-07T19:46:05.4584089Z 2025-05-07T19:46:05.4584092Z 2025-05-07T19:46:05.4584096Z 2025-05-07T19:46:05.4584103Z 2025-05-07T19:46:05.4584213Z  2025-05-07T19:46:05.4584339Z 2025-05-07T19:46:05.4584343Z 2025-05-07T19:46:05.4584347Z 2025-05-07T19:46:05.4584350Z 2025-05-07T19:46:05.4584354Z 2025-05-07T19:46:05.4584480Z  2025-05-07T19:46:05.4584617Z 2025-05-07T19:46:05.4584620Z 2025-05-07T19:46:05.4584624Z 2025-05-07T19:46:05.4584627Z 2025-05-07T19:46:05.4584630Z 2025-05-07T19:46:05.4584634Z 2025-05-07T19:46:05.4584764Z  2025-05-07T19:46:05.4584898Z 2025-05-07T19:46:05.4584903Z 2025-05-07T19:46:05.4584907Z 2025-05-07T19:46:05.4584910Z 2025-05-07T19:46:05.4584913Z 2025-05-07T19:46:05.4584916Z 2025-05-07T19:46:05.4584920Z 2025-05-07T19:46:05.4585137Z  2025-05-07T19:46:05.4585301Z 2025-05-07T19:46:05.4585305Z 2025-05-07T19:46:05.4585308Z 2025-05-07T19:46:05.4585312Z 2025-05-07T19:46:05.4585319Z 2025-05-07T19:46:05.4585322Z 2025-05-07T19:46:05.4585325Z 2025-05-07T19:46:05.4585329Z 2025-05-07T19:46:05.4585455Z  2025-05-07T19:46:05.4585631Z 2025-05-07T19:46:05.4585634Z 2025-05-07T19:46:05.4585637Z 2025-05-07T19:46:05.4585641Z 2025-05-07T19:46:05.4585644Z 2025-05-07T19:46:05.4585648Z 2025-05-07T19:46:05.4585651Z 2025-05-07T19:46:05.4585654Z 2025-05-07T19:46:05.4585658Z 2025-05-07T19:46:05.4585782Z  2025-05-07T19:46:05.4585965Z 2025-05-07T19:46:05.4585969Z 2025-05-07T19:46:05.4585972Z 2025-05-07T19:46:05.4585976Z 2025-05-07T19:46:05.4585979Z 2025-05-07T19:46:05.4585982Z 2025-05-07T19:46:05.4585986Z 2025-05-07T19:46:05.4585989Z 2025-05-07T19:46:05.4585992Z 2025-05-07T19:46:05.4585996Z 2025-05-07T19:46:05.4586126Z  2025-05-07T19:46:05.4586299Z 2025-05-07T19:46:05.4586320Z 2025-05-07T19:46:05.4586323Z 2025-05-07T19:46:05.4586326Z 2025-05-07T19:46:05.4586330Z 2025-05-07T19:46:05.4586333Z 2025-05-07T19:46:05.4586340Z 2025-05-07T19:46:05.4586343Z 2025-05-07T19:46:05.4586347Z 2025-05-07T19:46:05.4586350Z 2025-05-07T19:46:05.4586354Z 2025-05-07T19:46:05.4586490Z  2025-05-07T19:46:05.4586682Z 2025-05-07T19:46:05.4586709Z 2025-05-07T19:46:05.4586712Z 2025-05-07T19:46:05.4586716Z 2025-05-07T19:46:05.4586719Z 2025-05-07T19:46:05.4586722Z 2025-05-07T19:46:05.4586726Z 2025-05-07T19:46:05.4586729Z 2025-05-07T19:46:05.4586732Z 2025-05-07T19:46:05.4586736Z 2025-05-07T19:46:05.4586739Z 2025-05-07T19:46:05.4586742Z 2025-05-07T19:46:05.4586880Z  2025-05-07T19:46:05.4587094Z 2025-05-07T19:46:05.4587097Z 2025-05-07T19:46:05.4587101Z 2025-05-07T19:46:05.4587104Z 2025-05-07T19:46:05.4587108Z 2025-05-07T19:46:05.4587111Z 2025-05-07T19:46:05.4587114Z 2025-05-07T19:46:05.4587118Z 2025-05-07T19:46:05.4587121Z 2025-05-07T19:46:05.4587124Z 2025-05-07T19:46:05.4587128Z 2025-05-07T19:46:05.4587131Z 2025-05-07T19:46:05.4587134Z 2025-05-07T19:46:05.4587344Z  2025-05-07T19:46:05.4587563Z 2025-05-07T19:46:05.4587566Z 2025-05-07T19:46:05.4587647Z 2025-05-07T19:46:05.4587651Z 2025-05-07T19:46:05.4587655Z 2025-05-07T19:46:05.4587658Z 2025-05-07T19:46:05.4587661Z 2025-05-07T19:46:05.4587665Z 2025-05-07T19:46:05.4587668Z 2025-05-07T19:46:05.4587671Z 2025-05-07T19:46:05.4587674Z 2025-05-07T19:46:05.4587678Z 2025-05-07T19:46:05.4587681Z 2025-05-07T19:46:05.4587685Z 2025-05-07T19:46:05.4587840Z  2025-05-07T19:46:05.4588068Z 2025-05-07T19:46:05.4588072Z 2025-05-07T19:46:05.4588075Z 2025-05-07T19:46:05.4588079Z 2025-05-07T19:46:05.4588082Z 2025-05-07T19:46:05.4588085Z 2025-05-07T19:46:05.4588089Z 2025-05-07T19:46:05.4588092Z 2025-05-07T19:46:05.4588095Z 2025-05-07T19:46:05.4588099Z 2025-05-07T19:46:05.4588102Z 2025-05-07T19:46:05.4588106Z 2025-05-07T19:46:05.4588109Z 2025-05-07T19:46:05.4588112Z 2025-05-07T19:46:05.4588116Z 2025-05-07T19:46:05.4588290Z  2025-05-07T19:46:05.4588509Z 2025-05-07T19:46:05.4588512Z 2025-05-07T19:46:05.4588521Z 2025-05-07T19:46:05.4588524Z 2025-05-07T19:46:05.4588528Z 2025-05-07T19:46:05.4588531Z 2025-05-07T19:46:05.4588534Z 2025-05-07T19:46:05.4588538Z 2025-05-07T19:46:05.4588541Z 2025-05-07T19:46:05.4588544Z 2025-05-07T19:46:05.4588547Z 2025-05-07T19:46:05.4588551Z 2025-05-07T19:46:05.4588554Z 2025-05-07T19:46:05.4588574Z 2025-05-07T19:46:05.4588577Z 2025-05-07T19:46:05.4588580Z 2025-05-07T19:46:05.4588738Z  2025-05-07T19:46:05.4588960Z 2025-05-07T19:46:05.4588963Z 2025-05-07T19:46:05.4588967Z 2025-05-07T19:46:05.4588971Z 2025-05-07T19:46:05.4588975Z 2025-05-07T19:46:05.4588978Z 2025-05-07T19:46:05.4588982Z 2025-05-07T19:46:05.4588986Z 2025-05-07T19:46:05.4589009Z 2025-05-07T19:46:05.4589013Z 2025-05-07T19:46:05.4589016Z 2025-05-07T19:46:05.4589020Z 2025-05-07T19:46:05.4589024Z 2025-05-07T19:46:05.4589031Z 2025-05-07T19:46:05.4589034Z 2025-05-07T19:46:05.4589037Z 2025-05-07T19:46:05.4589041Z 2025-05-07T19:46:05.4589210Z  2025-05-07T19:46:05.4589440Z 2025-05-07T19:46:05.4589462Z 2025-05-07T19:46:05.4589466Z 2025-05-07T19:46:05.4589469Z 2025-05-07T19:46:05.4589473Z 2025-05-07T19:46:05.4589476Z 2025-05-07T19:46:05.4589480Z 2025-05-07T19:46:05.4589483Z 2025-05-07T19:46:05.4589486Z 2025-05-07T19:46:05.4589489Z 2025-05-07T19:46:05.4589493Z 2025-05-07T19:46:05.4589496Z 2025-05-07T19:46:05.4589499Z 2025-05-07T19:46:05.4589502Z 2025-05-07T19:46:05.4589506Z 2025-05-07T19:46:05.4589509Z 2025-05-07T19:46:05.4589513Z 2025-05-07T19:46:05.4589516Z 2025-05-07T19:46:05.4589687Z  2025-05-07T19:46:05.4589934Z 2025-05-07T19:46:05.4589938Z 2025-05-07T19:46:05.4590040Z  2025-05-07T19:46:05.4590149Z 2025-05-07T19:46:05.4590153Z 2025-05-07T19:46:05.4590273Z  2025-05-07T19:46:05.4590385Z 2025-05-07T19:46:05.4590393Z 2025-05-07T19:46:05.4590397Z 2025-05-07T19:46:05.4590502Z  2025-05-07T19:46:05.4590637Z 2025-05-07T19:46:05.4590644Z 2025-05-07T19:46:05.4590648Z 2025-05-07T19:46:05.4590652Z 2025-05-07T19:46:05.4590760Z  2025-05-07T19:46:05.4590884Z 2025-05-07T19:46:05.4590889Z 2025-05-07T19:46:05.4590892Z 2025-05-07T19:46:05.4590897Z 2025-05-07T19:46:05.4590918Z 2025-05-07T19:46:05.4591029Z  2025-05-07T19:46:05.4591160Z 2025-05-07T19:46:05.4591163Z 2025-05-07T19:46:05.4591167Z 2025-05-07T19:46:05.4591170Z 2025-05-07T19:46:05.4591173Z 2025-05-07T19:46:05.4591176Z 2025-05-07T19:46:05.4591307Z  2025-05-07T19:46:05.4591441Z 2025-05-07T19:46:05.4591444Z 2025-05-07T19:46:05.4591448Z 2025-05-07T19:46:05.4591451Z 2025-05-07T19:46:05.4591455Z 2025-05-07T19:46:05.4591459Z 2025-05-07T19:46:05.4591462Z 2025-05-07T19:46:05.4591583Z  2025-05-07T19:46:05.4591750Z 2025-05-07T19:46:05.4591753Z 2025-05-07T19:46:05.4593117Z 2025-05-07T19:46:05.4593121Z 2025-05-07T19:46:05.4593126Z 2025-05-07T19:46:05.4593130Z 2025-05-07T19:46:05.4593133Z 2025-05-07T19:46:05.4593199Z 2025-05-07T19:46:05.4593352Z  2025-05-07T19:46:05.4593536Z 2025-05-07T19:46:05.4593539Z 2025-05-07T19:46:05.4593543Z 2025-05-07T19:46:05.4593546Z 2025-05-07T19:46:05.4593550Z 2025-05-07T19:46:05.4593553Z 2025-05-07T19:46:05.4593556Z 2025-05-07T19:46:05.4593560Z 2025-05-07T19:46:05.4593563Z 2025-05-07T19:46:05.4593693Z  2025-05-07T19:46:05.4593883Z 2025-05-07T19:46:05.4593887Z 2025-05-07T19:46:05.4593891Z 2025-05-07T19:46:05.4593894Z 2025-05-07T19:46:05.4593897Z 2025-05-07T19:46:05.4593901Z 2025-05-07T19:46:05.4593904Z 2025-05-07T19:46:05.4593907Z 2025-05-07T19:46:05.4593911Z 2025-05-07T19:46:05.4593914Z 2025-05-07T19:46:05.4594052Z  2025-05-07T19:46:05.4594229Z 2025-05-07T19:46:05.4594252Z 2025-05-07T19:46:05.4594255Z 2025-05-07T19:46:05.4594259Z 2025-05-07T19:46:05.4594266Z 2025-05-07T19:46:05.4594270Z 2025-05-07T19:46:05.4594273Z 2025-05-07T19:46:05.4594277Z 2025-05-07T19:46:05.4594284Z 2025-05-07T19:46:05.4594288Z 2025-05-07T19:46:05.4594291Z 2025-05-07T19:46:05.4594432Z  2025-05-07T19:46:05.4594621Z 2025-05-07T19:46:05.4594643Z 2025-05-07T19:46:05.4594647Z 2025-05-07T19:46:05.4594651Z 2025-05-07T19:46:05.4594654Z 2025-05-07T19:46:05.4594657Z 2025-05-07T19:46:05.4594661Z 2025-05-07T19:46:05.4594664Z 2025-05-07T19:46:05.4594668Z 2025-05-07T19:46:05.4594671Z 2025-05-07T19:46:05.4594674Z 2025-05-07T19:46:05.4594678Z 2025-05-07T19:46:05.4594817Z  2025-05-07T19:46:05.4595032Z 2025-05-07T19:46:05.4595035Z 2025-05-07T19:46:05.4595039Z 2025-05-07T19:46:05.4595042Z 2025-05-07T19:46:05.4595046Z 2025-05-07T19:46:05.4595049Z 2025-05-07T19:46:05.4595052Z 2025-05-07T19:46:05.4595056Z 2025-05-07T19:46:05.4595059Z 2025-05-07T19:46:05.4595062Z 2025-05-07T19:46:05.4595069Z 2025-05-07T19:46:05.4595073Z 2025-05-07T19:46:05.4595076Z 2025-05-07T19:46:05.4595220Z  2025-05-07T19:46:05.4595446Z 2025-05-07T19:46:05.4595450Z 2025-05-07T19:46:05.4595453Z 2025-05-07T19:46:05.4595456Z 2025-05-07T19:46:05.4595460Z 2025-05-07T19:46:05.4595463Z 2025-05-07T19:46:05.4595466Z 2025-05-07T19:46:05.4595469Z 2025-05-07T19:46:05.4595473Z 2025-05-07T19:46:05.4595476Z 2025-05-07T19:46:05.4595480Z 2025-05-07T19:46:05.4595483Z 2025-05-07T19:46:05.4595486Z 2025-05-07T19:46:05.4595490Z 2025-05-07T19:46:05.4595645Z  2025-05-07T19:46:05.4595885Z 2025-05-07T19:46:05.4595888Z 2025-05-07T19:46:05.4595892Z 2025-05-07T19:46:05.4595896Z 2025-05-07T19:46:05.4595899Z 2025-05-07T19:46:05.4595903Z 2025-05-07T19:46:05.4595906Z 2025-05-07T19:46:05.4595910Z 2025-05-07T19:46:05.4595913Z 2025-05-07T19:46:05.4595917Z 2025-05-07T19:46:05.4595920Z 2025-05-07T19:46:05.4595923Z 2025-05-07T19:46:05.4595927Z 2025-05-07T19:46:05.4595933Z 2025-05-07T19:46:05.4595937Z 2025-05-07T19:46:05.4596121Z  2025-05-07T19:46:05.4596347Z 2025-05-07T19:46:05.4596351Z 2025-05-07T19:46:05.4596355Z 2025-05-07T19:46:05.4596358Z 2025-05-07T19:46:05.4596361Z 2025-05-07T19:46:05.4596364Z 2025-05-07T19:46:05.4596368Z 2025-05-07T19:46:05.4596371Z 2025-05-07T19:46:05.4596375Z 2025-05-07T19:46:05.4596378Z 2025-05-07T19:46:05.4596381Z 2025-05-07T19:46:05.4596384Z 2025-05-07T19:46:05.4596388Z 2025-05-07T19:46:05.4596415Z 2025-05-07T19:46:05.4596418Z 2025-05-07T19:46:05.4596422Z 2025-05-07T19:46:05.4596590Z  2025-05-07T19:46:05.4596819Z 2025-05-07T19:46:05.4596823Z 2025-05-07T19:46:05.4596827Z 2025-05-07T19:46:05.4596830Z 2025-05-07T19:46:05.4596833Z 2025-05-07T19:46:05.4596837Z 2025-05-07T19:46:05.4596840Z 2025-05-07T19:46:05.4596845Z 2025-05-07T19:46:05.4596874Z 2025-05-07T19:46:05.4596878Z 2025-05-07T19:46:05.4596881Z 2025-05-07T19:46:05.4596951Z 2025-05-07T19:46:05.4596954Z 2025-05-07T19:46:05.4596958Z 2025-05-07T19:46:05.4596961Z 2025-05-07T19:46:05.4596964Z 2025-05-07T19:46:05.4597024Z 2025-05-07T19:46:05.4597199Z  2025-05-07T19:46:05.4597440Z 2025-05-07T19:46:05.4597471Z 2025-05-07T19:46:05.4597474Z 2025-05-07T19:46:05.4597477Z 2025-05-07T19:46:05.4597481Z 2025-05-07T19:46:05.4597484Z 2025-05-07T19:46:05.4597488Z 2025-05-07T19:46:05.4597491Z 2025-05-07T19:46:05.4597494Z 2025-05-07T19:46:05.4597497Z 2025-05-07T19:46:05.4597501Z 2025-05-07T19:46:05.4597505Z 2025-05-07T19:46:05.4597508Z 2025-05-07T19:46:05.4597511Z 2025-05-07T19:46:05.4597514Z 2025-05-07T19:46:05.4597518Z 2025-05-07T19:46:05.4597521Z 2025-05-07T19:46:05.4597524Z 2025-05-07T19:46:05.4597709Z  2025-05-07T19:46:05.4597972Z 2025-05-07T19:46:05.4597976Z 2025-05-07T19:46:05.4598087Z  2025-05-07T19:46:05.4598215Z 2025-05-07T19:46:05.4598222Z 2025-05-07T19:46:05.4598359Z  2025-05-07T19:46:05.4598478Z 2025-05-07T19:46:05.4598481Z 2025-05-07T19:46:05.4598485Z 2025-05-07T19:46:05.4598603Z  2025-05-07T19:46:05.4598756Z 2025-05-07T19:46:05.4598759Z 2025-05-07T19:46:05.4598762Z 2025-05-07T19:46:05.4598766Z 2025-05-07T19:46:05.4598884Z  2025-05-07T19:46:05.4599018Z 2025-05-07T19:46:05.4599022Z 2025-05-07T19:46:05.4599026Z 2025-05-07T19:46:05.4599031Z 2025-05-07T19:46:05.4599059Z 2025-05-07T19:46:05.4599177Z  2025-05-07T19:46:05.4599313Z 2025-05-07T19:46:05.4599317Z 2025-05-07T19:46:05.4599320Z 2025-05-07T19:46:05.4599324Z 2025-05-07T19:46:05.4599327Z 2025-05-07T19:46:05.4599330Z 2025-05-07T19:46:05.4599531Z  2025-05-07T19:46:05.4599674Z 2025-05-07T19:46:05.4599678Z 2025-05-07T19:46:05.4599698Z 2025-05-07T19:46:05.4599701Z 2025-05-07T19:46:05.4599705Z 2025-05-07T19:46:05.4599708Z 2025-05-07T19:46:05.4599711Z 2025-05-07T19:46:05.4599833Z  2025-05-07T19:46:05.4599983Z 2025-05-07T19:46:05.4599987Z 2025-05-07T19:46:05.4599991Z 2025-05-07T19:46:05.4599994Z 2025-05-07T19:46:05.4600001Z 2025-05-07T19:46:05.4600005Z 2025-05-07T19:46:05.4600029Z 2025-05-07T19:46:05.4600032Z 2025-05-07T19:46:05.4600160Z  2025-05-07T19:46:05.4600550Z 2025-05-07T19:46:05.4600553Z 2025-05-07T19:46:05.4600557Z 2025-05-07T19:46:05.4600561Z 2025-05-07T19:46:05.4600564Z 2025-05-07T19:46:05.4600568Z 2025-05-07T19:46:05.4600571Z 2025-05-07T19:46:05.4600575Z 2025-05-07T19:46:05.4600579Z 2025-05-07T19:46:05.4600734Z  2025-05-07T19:46:05.4600902Z 2025-05-07T19:46:05.4600988Z 2025-05-07T19:46:05.4600992Z 2025-05-07T19:46:05.4600995Z 2025-05-07T19:46:05.4600998Z 2025-05-07T19:46:05.4601002Z 2025-05-07T19:46:05.4601005Z 2025-05-07T19:46:05.4601009Z 2025-05-07T19:46:05.4601012Z 2025-05-07T19:46:05.4601015Z 2025-05-07T19:46:05.4601176Z  2025-05-07T19:46:05.4601351Z 2025-05-07T19:46:05.4601359Z 2025-05-07T19:46:05.4601362Z 2025-05-07T19:46:05.4601366Z 2025-05-07T19:46:05.4601369Z 2025-05-07T19:46:05.4601373Z 2025-05-07T19:46:05.4601379Z 2025-05-07T19:46:05.4601383Z 2025-05-07T19:46:05.4601387Z 2025-05-07T19:46:05.4601390Z 2025-05-07T19:46:05.4601393Z 2025-05-07T19:46:05.4601543Z  2025-05-07T19:46:05.4601728Z 2025-05-07T19:46:05.4601732Z 2025-05-07T19:46:05.4601736Z 2025-05-07T19:46:05.4601739Z 2025-05-07T19:46:05.4601742Z 2025-05-07T19:46:05.4601746Z 2025-05-07T19:46:05.4601749Z 2025-05-07T19:46:05.4601753Z 2025-05-07T19:46:05.4601756Z 2025-05-07T19:46:05.4601759Z 2025-05-07T19:46:05.4601763Z 2025-05-07T19:46:05.4601766Z 2025-05-07T19:46:05.4601921Z  2025-05-07T19:46:05.4602113Z 2025-05-07T19:46:05.4602117Z 2025-05-07T19:46:05.4602120Z 2025-05-07T19:46:05.4602124Z 2025-05-07T19:46:05.4602127Z 2025-05-07T19:46:05.4602132Z 2025-05-07T19:46:05.4602135Z 2025-05-07T19:46:05.4602138Z 2025-05-07T19:46:05.4602260Z 2025-05-07T19:46:05.4602263Z 2025-05-07T19:46:05.4602285Z 2025-05-07T19:46:05.4602288Z 2025-05-07T19:46:05.4602291Z 2025-05-07T19:46:05.4602518Z  2025-05-07T19:46:05.4602722Z 2025-05-07T19:46:05.4602726Z 2025-05-07T19:46:05.4602729Z 2025-05-07T19:46:05.4602732Z 2025-05-07T19:46:05.4602736Z 2025-05-07T19:46:05.4602739Z 2025-05-07T19:46:05.4602743Z 2025-05-07T19:46:05.4602747Z 2025-05-07T19:46:05.4602769Z 2025-05-07T19:46:05.4602772Z 2025-05-07T19:46:05.4602776Z 2025-05-07T19:46:05.4602779Z 2025-05-07T19:46:05.4602782Z 2025-05-07T19:46:05.4602786Z 2025-05-07T19:46:05.4602934Z  2025-05-07T19:46:05.4603142Z 2025-05-07T19:46:05.4603145Z 2025-05-07T19:46:05.4603148Z 2025-05-07T19:46:05.4603152Z 2025-05-07T19:46:05.4603155Z 2025-05-07T19:46:05.4603175Z 2025-05-07T19:46:05.4603178Z 2025-05-07T19:46:05.4603181Z 2025-05-07T19:46:05.4603185Z 2025-05-07T19:46:05.4603188Z 2025-05-07T19:46:05.4603191Z 2025-05-07T19:46:05.4603199Z 2025-05-07T19:46:05.4603202Z 2025-05-07T19:46:05.4603205Z 2025-05-07T19:46:05.4603209Z 2025-05-07T19:46:05.4603367Z  2025-05-07T19:46:05.4603599Z 2025-05-07T19:46:05.4603603Z 2025-05-07T19:46:05.4603606Z 2025-05-07T19:46:05.4603610Z 2025-05-07T19:46:05.4603613Z 2025-05-07T19:46:05.4603616Z 2025-05-07T19:46:05.4603620Z 2025-05-07T19:46:05.4603623Z 2025-05-07T19:46:05.4603627Z 2025-05-07T19:46:05.4603630Z 2025-05-07T19:46:05.4603634Z 2025-05-07T19:46:05.4603637Z 2025-05-07T19:46:05.4603640Z 2025-05-07T19:46:05.4603643Z 2025-05-07T19:46:05.4603647Z 2025-05-07T19:46:05.4603651Z 2025-05-07T19:46:05.4603809Z  2025-05-07T19:46:05.4604048Z 2025-05-07T19:46:05.4604052Z 2025-05-07T19:46:05.4604056Z 2025-05-07T19:46:05.4604059Z 2025-05-07T19:46:05.4604062Z 2025-05-07T19:46:05.4604066Z 2025-05-07T19:46:05.4604069Z 2025-05-07T19:46:05.4604072Z 2025-05-07T19:46:05.4604076Z 2025-05-07T19:46:05.4604082Z 2025-05-07T19:46:05.4604085Z 2025-05-07T19:46:05.4604089Z 2025-05-07T19:46:05.4604093Z 2025-05-07T19:46:05.4604100Z 2025-05-07T19:46:05.4604103Z 2025-05-07T19:46:05.4604107Z 2025-05-07T19:46:05.4604110Z 2025-05-07T19:46:05.4604288Z  2025-05-07T19:46:05.4604514Z 2025-05-07T19:46:05.4604518Z 2025-05-07T19:46:05.4604522Z 2025-05-07T19:46:05.4604525Z 2025-05-07T19:46:05.4604529Z 2025-05-07T19:46:05.4604532Z 2025-05-07T19:46:05.4604536Z 2025-05-07T19:46:05.4604539Z 2025-05-07T19:46:05.4604542Z 2025-05-07T19:46:05.4604546Z 2025-05-07T19:46:05.4604550Z 2025-05-07T19:46:05.4604553Z 2025-05-07T19:46:05.4604586Z 2025-05-07T19:46:05.4604589Z 2025-05-07T19:46:05.4604593Z 2025-05-07T19:46:05.4604596Z 2025-05-07T19:46:05.4604599Z 2025-05-07T19:46:05.4604603Z 2025-05-07T19:46:05.4604785Z  2025-05-07T19:46:05.4605014Z 2025-05-07T19:46:05.4605017Z 2025-05-07T19:46:05.4605152Z  2025-05-07T19:46:05.4605269Z 2025-05-07T19:46:05.4605273Z 2025-05-07T19:46:05.4605390Z  2025-05-07T19:46:05.4605543Z 2025-05-07T19:46:05.4605551Z 2025-05-07T19:46:05.4605554Z 2025-05-07T19:46:05.4605669Z  2025-05-07T19:46:05.4605797Z 2025-05-07T19:46:05.4605801Z 2025-05-07T19:46:05.4605804Z 2025-05-07T19:46:05.4605808Z 2025-05-07T19:46:05.4605955Z  2025-05-07T19:46:05.4606089Z 2025-05-07T19:46:05.4606093Z 2025-05-07T19:46:05.4606097Z 2025-05-07T19:46:05.4606100Z 2025-05-07T19:46:05.4606103Z 2025-05-07T19:46:05.4606227Z  2025-05-07T19:46:05.4606391Z 2025-05-07T19:46:05.4606395Z 2025-05-07T19:46:05.4606398Z 2025-05-07T19:46:05.4606402Z 2025-05-07T19:46:05.4606405Z 2025-05-07T19:46:05.4606408Z 2025-05-07T19:46:05.4606532Z  2025-05-07T19:46:05.4606681Z 2025-05-07T19:46:05.4606685Z 2025-05-07T19:46:05.4606718Z 2025-05-07T19:46:05.4606721Z 2025-05-07T19:46:05.4606724Z 2025-05-07T19:46:05.4606728Z 2025-05-07T19:46:05.4606793Z 2025-05-07T19:46:05.4606920Z  2025-05-07T19:46:05.4607076Z 2025-05-07T19:46:05.4607080Z 2025-05-07T19:46:05.4607153Z 2025-05-07T19:46:05.4607158Z 2025-05-07T19:46:05.4607161Z 2025-05-07T19:46:05.4607165Z 2025-05-07T19:46:05.4607191Z 2025-05-07T19:46:05.4607195Z 2025-05-07T19:46:05.4607331Z  2025-05-07T19:46:05.4607498Z 2025-05-07T19:46:05.4607502Z 2025-05-07T19:46:05.4607505Z 2025-05-07T19:46:05.4607509Z 2025-05-07T19:46:05.4607512Z 2025-05-07T19:46:05.4607516Z 2025-05-07T19:46:05.4607519Z 2025-05-07T19:46:05.4607522Z 2025-05-07T19:46:05.4607526Z 2025-05-07T19:46:05.4607702Z  done 2025-05-07T19:46:05.6707182Z Preparing transaction: | / done 2025-05-07T19:46:06.3764104Z Verifying transaction: \ | / - \ | / done 2025-05-07T19:46:06.6818830Z Executing transaction: \ | / done 2025-05-07T19:46:08.6630521Z [INSTALL] Fixing file placements for CUDA 12.8.0+ ... 2025-05-07T19:46:08.6631037Z [INSTALL] Creating symlinks: libnvToolsExt.so 2025-05-07T19:46:08.6631894Z + ln -sf /github/home/miniconda/envs/build_binary/lib/libnvToolsExt.so.1 /github/home/miniconda/envs/build_binary/lib/libnvToolsExt.so 2025-05-07T19:46:08.6632550Z 2025-05-07T19:46:08.6648182Z 2025-05-07T19:46:08.6650587Z + ln -sf /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/libnvToolsExt.so.1 /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/libnvToolsExt.so 2025-05-07T19:46:08.6653042Z 2025-05-07T19:46:08.6661717Z 2025-05-07T19:46:08.6662182Z [INSTALL] Copying nvtx3 headers ... 2025-05-07T19:46:08.6667062Z + cp -r /github/home/miniconda/envs/build_binary/nsight-compute-2025.1.0/host/target-linux-x64/nvtx/include/nvtx3/nvToolsExt.h /github/home/miniconda/envs/build_binary/nsight-compute-2025.1.0/host/target-linux-x64/nvtx/include/nvtx3/nvToolsExtCuda.h /github/home/miniconda/envs/build_binary/nsight-compute-2025.1.0/host/target-linux-x64/nvtx/include/nvtx3/nvToolsExtCudaRt.h /github/home/miniconda/envs/build_binary/nsight-compute-2025.1.0/host/target-linux-x64/nvtx/include/nvtx3/nvToolsExtOpenCL.h /github/home/miniconda/envs/build_binary/nsight-compute-2025.1.0/host/target-linux-x64/nvtx/include/nvtx3/nvToolsExtSync.h /github/home/miniconda/envs/build_binary/nsight-compute-2025.1.0/host/target-linux-x64/nvtx/include/nvtx3/nvtx3.hpp /github/home/miniconda/envs/build_binary/nsight-compute-2025.1.0/host/target-linux-x64/nvtx/include/nvtx3/nvtxDetail /github/home/miniconda/envs/build_binary/include/ 2025-05-07T19:46:08.6671359Z 2025-05-07T19:46:08.6889108Z 2025-05-07T19:46:08.6898239Z + cp -r /github/home/miniconda/envs/build_binary/nsight-compute-2025.1.0/host/target-linux-x64/nvtx/include/nvtx3/nvToolsExt.h /github/home/miniconda/envs/build_binary/nsight-compute-2025.1.0/host/target-linux-x64/nvtx/include/nvtx3/nvToolsExtCuda.h /github/home/miniconda/envs/build_binary/nsight-compute-2025.1.0/host/target-linux-x64/nvtx/include/nvtx3/nvToolsExtCudaRt.h /github/home/miniconda/envs/build_binary/nsight-compute-2025.1.0/host/target-linux-x64/nvtx/include/nvtx3/nvToolsExtOpenCL.h /github/home/miniconda/envs/build_binary/nsight-compute-2025.1.0/host/target-linux-x64/nvtx/include/nvtx3/nvToolsExtSync.h /github/home/miniconda/envs/build_binary/nsight-compute-2025.1.0/host/target-linux-x64/nvtx/include/nvtx3/nvtx3.hpp /github/home/miniconda/envs/build_binary/nsight-compute-2025.1.0/host/target-linux-x64/nvtx/include/nvtx3/nvtxDetail /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include/ 2025-05-07T19:46:08.6903116Z 2025-05-07T19:46:08.6912325Z 2025-05-07T19:46:08.6913111Z [INSTALL] Appending libcuda.so path to LD_LIBRARY_PATH ... 2025-05-07T19:46:08.7307104Z [ENV] Appending to LD_LIBRARY_PATH: /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/stubs ... 2025-05-07T19:46:10.5237987Z ERROR conda.cli.main_run:execute(125): `conda run printenv LD_LIBRARY_PATH` failed. (See above for error) 2025-05-07T19:46:10.5967668Z + conda env config vars set -n build_binary LD_LIBRARY_PATH=/github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/stubs 2025-05-07T19:46:10.5970010Z 2025-05-07T19:46:11.0048189Z 2025-05-07T19:46:11.0051282Z [INSTALL] Setting environment variable NVML_LIB_PATH ... 2025-05-07T19:46:11.0423507Z + conda env config vars set -n build_binary NVML_LIB_PATH=/github/home/miniconda/envs/build_binary/lib/stubs/libnvidia-ml.so 2025-05-07T19:46:11.0425167Z 2025-05-07T19:46:11.4481307Z 2025-05-07T19:46:11.4481736Z [INSTALL] Setting environment variable CUDA_INCLUDE_DIRS ... 2025-05-07T19:46:11.4483098Z + conda env config vars set -n build_binary CUDA_INCLUDE_DIRS="/github/home/miniconda/envs/build_binary/include/:/github/home/miniconda/envs/build_binary/targets/x86_64-linux/include/" 2025-05-07T19:46:11.4483898Z 2025-05-07T19:46:11.8681827Z 2025-05-07T19:46:13.8570004Z [CHECK] cuda_runtime.h found in CONDA_PREFIX PATH (file): /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include/cuda_runtime.h 2025-05-07T19:46:15.7870013Z [CHECK] libcuda.so found in CONDA_PREFIX PATH (file): /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/stubs/libcuda.so 2025-05-07T19:46:17.7161152Z [CHECK] libnvToolsExt.so found in CONDA_PREFIX PATH (symbolic link): /github/home/miniconda/envs/build_binary/lib/libnvToolsExt.so 2025-05-07T19:46:17.7163772Z /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/libnvToolsExt.so 2025-05-07T19:46:19.6741110Z [CHECK] libnvidia-ml.so found in CONDA_PREFIX PATH (file): /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/stubs/libnvidia-ml.so 2025-05-07T19:46:21.5300876Z /github/home/miniconda/envs/build_binary/bin/nvcc 2025-05-07T19:46:21.5301777Z 2025-05-07T19:46:21.5870535Z [CHECK] Binary nvcc found in PATH 2025-05-07T19:46:25.3143414Z /tmp/tmpxoxufd94: line 3: clang: command not found 2025-05-07T19:46:25.3143754Z 2025-05-07T19:46:25.3144085Z ERROR conda.cli.main_run:execute(125): `conda run clang --version` failed. (See above for error) 2025-05-07T19:46:25.3925327Z + ls -la /github/home/miniconda/envs/build_binary/etc/conda/activate.d 2025-05-07T19:46:25.3926438Z 2025-05-07T19:46:25.4107470Z total 56 2025-05-07T19:46:25.4108319Z drwxr-xr-x. 2 root root 16384 May 7 19:46 . 2025-05-07T19:46:25.4109351Z drwxr-xr-x. 5 root root 62 May 7 19:44 .. 2025-05-07T19:46:25.4110637Z -rw-r--r--. 2 root root 3778 Jun 10 2024 activate-binutils_linux-64.sh 2025-05-07T19:46:25.4112082Z -rw-r--r--. 2 root root 11630 Jun 10 2024 activate-gcc_linux-64.sh 2025-05-07T19:46:25.4113070Z -rw-r--r--. 2 root root 5190 Jun 10 2024 activate-gxx_linux-64.sh 2025-05-07T19:46:25.4113556Z -rw-r--r--. 2 root root 136 Mar 27 01:27 libglib_activate.sh 2025-05-07T19:46:25.4113990Z -rw-r--r--. 2 root root 872 May 7 16:10 libxml2_activate.sh 2025-05-07T19:46:25.4114441Z -rw-r--r--. 2 root root 499 Mar 28 22:35 openjdk_activate.sh 2025-05-07T19:46:25.4114886Z -rw-r--r--. 2 root root 2932 Jan 24 22:22 ~cuda-nvcc_activate.sh 2025-05-07T19:46:25.4115199Z 2025-05-07T19:46:25.4115438Z [INSTALL] Removing the -ccbin=CXX hook from NVCC activation scripts ... 2025-05-07T19:46:25.4116180Z + sed -i /-ccbin=/d /github/home/miniconda/envs/build_binary/etc/conda/activate.d/*cuda-nvcc_activate.sh 2025-05-07T19:46:25.4116648Z 2025-05-07T19:46:25.4136114Z 2025-05-07T19:46:25.4136702Z + conda run -n build_binary c++ --version | grep -i clang 2025-05-07T19:46:25.4137016Z 2025-05-07T19:46:27.3150545Z 2025-05-07T19:46:27.3150984Z [BUILD] Setting prepend flags for NVCC ... 2025-05-07T19:46:27.3151596Z + conda env config vars set -n build_binary NVCC_PREPEND_FLAGS="-allow-unsupported-compiler" 2025-05-07T19:46:27.3152014Z 2025-05-07T19:46:27.7407710Z 2025-05-07T19:46:27.7408163Z + conda run -n build_binary printenv NVCC_PREPEND_FLAGS 2025-05-07T19:46:27.7408485Z 2025-05-07T19:46:29.5434825Z -allow-unsupported-compiler 2025-05-07T19:46:29.5435394Z 2025-05-07T19:46:29.6020388Z 2025-05-07T19:46:29.6021177Z [INFO] Printing out all preprocessor defines in nvcc ... 2025-05-07T19:46:29.6021788Z + conda run -n build_binary nvcc --compiler-options -dM -E -x cu - < /dev/null 2025-05-07T19:46:29.6022469Z 2025-05-07T19:46:31.4620592Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:46:31.4621801Z #define _GLIBCXX_DEPRECATED_SUGGEST(ALT) __attribute__ ((__deprecated__ ("use '" ALT "' instead"))) 2025-05-07T19:46:31.4622337Z 2025-05-07T19:46:31.4622495Z #define M_PIl 3.141592653589793238462643383279502884L 2025-05-07T19:46:31.4622845Z #define _IO_CURRENTLY_PUTTING 0x800 2025-05-07T19:46:31.4623196Z #define __W_EXITCODE(ret,sig) ((ret) << 8 | (sig)) 2025-05-07T19:46:31.4623536Z #define __DBL_MIN_EXP__ (-1021) 2025-05-07T19:46:31.4623828Z #define _STL_PAIR_H 1 2025-05-07T19:46:31.4624084Z #define __cpp_attributes 200809L 2025-05-07T19:46:31.4624440Z #define __cpp_nontype_template_parameter_auto 201606L 2025-05-07T19:46:31.4624824Z #define __DELETE_THROW throw() 2025-05-07T19:46:31.4625090Z #define _PTRDIFF_T_ 2025-05-07T19:46:31.4625359Z #define M_PI_4 0.78539816339744830962 2025-05-07T19:46:31.4625655Z #define __UINT_LEAST16_MAX__ 0xffff 2025-05-07T19:46:31.4625957Z #define _IO_LEFT 02 2025-05-07T19:46:31.4626189Z #define __ATOMIC_ACQUIRE 2 2025-05-07T19:46:31.4626473Z #define _POSIX2_BC_SCALE_MAX 99 2025-05-07T19:46:31.4626758Z #define _GLIBCXX_USE_RANDOM_TR1 1 2025-05-07T19:46:31.4627228Z #define _GLIBCXX_MOVE_BACKWARD3(_Tp,_Up,_Vp) std::move_backward(_Tp, _Up, _Vp) 2025-05-07T19:46:31.4627703Z #define __FLT128_MAX_10_EXP__ 4932 2025-05-07T19:46:31.4627990Z #define RE_DUP_MAX (0x7fff) 2025-05-07T19:46:31.4628266Z #define _IOS_OUTPUT 2 2025-05-07T19:46:31.4628509Z #define __SM_100_RT_HPP__ 2025-05-07T19:46:31.4628856Z #define __FLT_MIN__ 1.17549435082228750796873653722224568e-38F 2025-05-07T19:46:31.4629246Z #define toascii_l(c,l) __toascii_l ((c), (l)) 2025-05-07T19:46:31.4629589Z #define __GCC_IEC_559_COMPLEX 2 2025-05-07T19:46:31.4629867Z #define _GLIBCXX_USE_FCHMOD 1 2025-05-07T19:46:31.4630169Z #define __cpp_aggregate_nsdmi 201304L 2025-05-07T19:46:31.4631065Z #define __bswap_16(x) (__extension__ ({ unsigned short int __v, __x = (unsigned short int) (x); if (__builtin_constant_p (__x)) __v = __bswap_constant_16 (__x); else __asm__ ("rorw $8, %w0" : "=r" (__v) : "0" (__x) : "cc"); __v; })) 2025-05-07T19:46:31.4631976Z #define __UINT_LEAST8_TYPE__ unsigned char 2025-05-07T19:46:31.4632314Z #define __SIZEOF_FLOAT80__ 16 2025-05-07T19:46:31.4632625Z #define cudaTextureTypeCubemapLayered 0xFC 2025-05-07T19:46:31.4632965Z #define _T_WCHAR_ 2025-05-07T19:46:31.4633211Z #define stdout stdout 2025-05-07T19:46:31.4633552Z #define _GLIBCXX_ABI_TAG_CXX11 __attribute ((__abi_tag__ ("cxx11"))) 2025-05-07T19:46:31.4633970Z #define CHAR_BIT __CHAR_BIT__ 2025-05-07T19:46:31.4634231Z #define __flexarr [] 2025-05-07T19:46:31.4634487Z #define _GLIBCXX_HAVE_FINITEF 1 2025-05-07T19:46:31.4634819Z #define __islower_l(c,l) __isctype_l((c), _ISlower, (l)) 2025-05-07T19:46:31.4635193Z #define _IO_FLAGS2_USER_WBUF 8 2025-05-07T19:46:31.4635456Z #define _MATH_H 1 2025-05-07T19:46:31.4635755Z #define cudaOccupancyDisableCachingOverride 0x01 2025-05-07T19:46:31.4636114Z #define __S64_TYPE long int 2025-05-07T19:46:31.4636388Z #define __stub_fchflags 2025-05-07T19:46:31.4636672Z #define cudaDeviceScheduleMask 0x07 2025-05-07T19:46:31.4636975Z #define __SQUAD_TYPE long int 2025-05-07T19:46:31.4637263Z #define __INTMAX_C(c) c ## L 2025-05-07T19:46:31.4637584Z #define cudaStreamFireAndForget ((cudaStream_t)0x4) 2025-05-07T19:46:31.4637954Z #define _BSD_SIZE_T_DEFINED_ 2025-05-07T19:46:31.4638225Z #define NL_NMAX INT_MAX 2025-05-07T19:46:31.4638482Z #define _BITS_TIME_H 1 2025-05-07T19:46:31.4638766Z #define M_LN10l 2.302585092994045684017991454684364208L 2025-05-07T19:46:31.4639118Z #define _GLIBCXX_TXN_SAFE_DYN 2025-05-07T19:46:31.4639449Z #define cudaStreamTailLaunch ((cudaStream_t)0x3) 2025-05-07T19:46:31.4639818Z #define M_El 2.718281828459045235360287471352662498L 2025-05-07T19:46:31.4640252Z #define _PSTL_PRAGMA_DECLARE_SIMD _PSTL_PRAGMA(omp declare simd) 2025-05-07T19:46:31.4640789Z #define __CHAR_BIT__ 8 2025-05-07T19:46:31.4641076Z #define __FSWORD_T_TYPE __SYSCALL_SLONG_TYPE 2025-05-07T19:46:31.4641476Z #define _PSTL_STRING_CONCAT(x,y) x #y 2025-05-07T19:46:31.4641801Z #define _GLIBCXX98_USE_C99_MATH 1 2025-05-07T19:46:31.4642077Z #define FP_NAN 0 2025-05-07T19:46:31.4642363Z #define makedev(maj,min) gnu_dev_makedev (maj, min) 2025-05-07T19:46:31.4642802Z #define cudaGetDeviceProperties cudaGetDeviceProperties_v2 2025-05-07T19:46:31.4643232Z #define __cudaCDP2GetErrorString 2025-05-07T19:46:31.4643542Z #define SHRT_MAX __SHRT_MAX__ 2025-05-07T19:46:31.4643813Z #define _GLIBCXX_X86_RDSEED 1 2025-05-07T19:46:31.4644195Z #define __SM_80_RT_H__ 2025-05-07T19:46:31.4644420Z #define _NEW 2025-05-07T19:46:31.4644661Z #define CLOCK_PROCESS_CPUTIME_ID 2 2025-05-07T19:46:31.4644947Z #define __UINT8_MAX__ 0xff 2025-05-07T19:46:31.4645344Z #define _PSTL_ASSERT_MSG(_Condition,_Message) __glibcxx_assert(_Condition) 2025-05-07T19:46:31.4645767Z #define __SCHAR_WIDTH__ 8 2025-05-07T19:46:31.4646034Z #define __USE_ANSI 1 2025-05-07T19:46:31.4646331Z #define _IO_BE(expr,res) __builtin_expect ((expr), res) 2025-05-07T19:46:31.4646755Z #define __isupper_l(c,l) __isctype_l((c), _ISupper, (l)) 2025-05-07T19:46:31.4647143Z #define __cudaCDP2Memcpy2DAsync_ptsz 2025-05-07T19:46:31.4647452Z #define __WINT_MAX__ 0xffffffffU 2025-05-07T19:46:31.4647753Z #define __SIZEOF_PTHREAD_ATTR_T 56 2025-05-07T19:46:31.4648043Z #define __FLT32_MIN_EXP__ (-125) 2025-05-07T19:46:31.4648345Z #define _GLIBCXX_END_NAMESPACE_LDBL 2025-05-07T19:46:31.4648633Z #define PIPE_BUF 4096 2025-05-07T19:46:31.4648979Z #define _PSTL_PRAGMA_SIMD_ORDERED_MONOTONIC_2ARGS(PRM1,PRM2) 2025-05-07T19:46:31.4649459Z #define _GLIBCXX_NAMESPACE_LDBL_OR_CXX11 _GLIBCXX_NAMESPACE_CXX11 2025-05-07T19:46:31.4649869Z #define ADJ_TICK 0x4000 2025-05-07T19:46:31.4650165Z #define _PSTL_VERSION_PATCH (_PSTL_VERSION % 10) 2025-05-07T19:46:31.4650495Z #define MQ_PRIO_MAX 32768 2025-05-07T19:46:31.4650778Z #define __SIZEOF_PTHREAD_MUTEXATTR_T 4 2025-05-07T19:46:31.4651102Z #define __WAIT_INT(status) (*(int *) &(status)) 2025-05-07T19:46:31.4651623Z #define __GLIBC_PREREQ(maj,min) ((__GLIBC__ << 16) + __GLIBC_MINOR__ >= ((maj) << 16) + (min)) 2025-05-07T19:46:31.4652206Z #define cudaCooperativeLaunchMultiDeviceNoPreSync 0x01 2025-05-07T19:46:31.4652766Z #define _XOPEN_SOURCE 700 2025-05-07T19:46:31.4653027Z #define _POSIX2_BC_DIM_MAX 2048 2025-05-07T19:46:31.4653350Z #define __VECTOR_FUNCTIONS_HPP__ 2025-05-07T19:46:31.4653662Z #define __cpp_static_assert 201411L 2025-05-07T19:46:31.4653954Z #define __GLIBCXX__ 20230528 2025-05-07T19:46:31.4654241Z #define _GLIBCXX_HAVE_STRXFRM_L 1 2025-05-07T19:46:31.4654529Z #define _POSIX_TTY_NAME_MAX 9 2025-05-07T19:46:31.4654839Z #define _GLIBCXX_USE_WEAK_REF __GXX_WEAK__ 2025-05-07T19:46:31.4655157Z #define __OFF_T_MATCHES_OFF64_T 1 2025-05-07T19:46:31.4655468Z #define __ORDER_LITTLE_ENDIAN__ 1234 2025-05-07T19:46:31.4655782Z #define __SIZE_MAX__ 0xffffffffffffffffUL 2025-05-07T19:46:31.4656177Z #define __ispunct_l(c,l) __isctype_l((c), _ISpunct, (l)) 2025-05-07T19:46:31.4656558Z #define __WCHAR_MAX__ 0x7fffffff 2025-05-07T19:46:31.4656854Z #define _GLIBCXX_USE_CLOCK_MONOTONIC 1 2025-05-07T19:46:31.4657197Z #define __BLKCNT_T_TYPE __SYSCALL_SLONG_TYPE 2025-05-07T19:46:31.4657570Z #define __isprint_l(c,l) __isctype_l((c), _ISprint, (l)) 2025-05-07T19:46:31.4657963Z #define cudaNvSciSyncAttrSignal 0x1 2025-05-07T19:46:31.4658274Z #define _GLIBCXX_USE_LONG_LONG 1 2025-05-07T19:46:31.4658592Z #define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_1 1 2025-05-07T19:46:31.4658933Z #define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_2 1 2025-05-07T19:46:31.4659286Z #define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_4 1 2025-05-07T19:46:31.4659812Z #define __DBL_DENORM_MIN__ double(4.94065645841246544176568792868221372e-324L) 2025-05-07T19:46:31.4660322Z #define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_8 1 2025-05-07T19:46:31.4660657Z #define ADJ_ESTERROR 0x0008 2025-05-07T19:46:31.4660940Z #define __GCC_ATOMIC_CHAR_LOCK_FREE 2 2025-05-07T19:46:31.4661332Z #define __GCC_IEC_559 2 2025-05-07T19:46:31.4661636Z #define __cpp_lib_transformation_trait_aliases 201304 2025-05-07T19:46:31.4662065Z #define _IO_flockfile(_fp) 2025-05-07T19:46:31.4662339Z #define CLOCK_MONOTONIC_RAW 4 2025-05-07T19:46:31.4662631Z #define __FLT32X_DECIMAL_DIG__ 17 2025-05-07T19:46:31.4662901Z #define _IOFBF 0 2025-05-07T19:46:31.4663132Z #define __USE_BSD 1 2025-05-07T19:46:31.4663381Z #define __FLT_EVAL_METHOD__ 0 2025-05-07T19:46:31.4663659Z #define SHRT_MIN (-SHRT_MAX - 1) 2025-05-07T19:46:31.4663957Z #define _IO_USER_LOCK 0x8000 2025-05-07T19:46:31.4664218Z #define _IO_NO_WRITES 8 2025-05-07T19:46:31.4664492Z #define _GLIBCXX_PSEUDO_VISIBILITY(V) 2025-05-07T19:46:31.4664860Z #define __ASMNAME2(prefix,cname) __STRING (prefix) cname 2025-05-07T19:46:31.4665246Z #define _GLIBCXX_HAVE_SYS_STAT_H 1 2025-05-07T19:46:31.4665561Z #define MB_CUR_MAX (__ctype_get_mb_cur_max ()) 2025-05-07T19:46:31.4665910Z #define __cpp_binary_literals 201304L 2025-05-07T19:46:31.4666227Z #define _CPP_TYPE_TRAITS_H 1 2025-05-07T19:46:31.4666506Z #define __BEGIN_NAMESPACE_C99 2025-05-07T19:46:31.4666804Z #define __FLT64_DECIMAL_DIG__ 17 2025-05-07T19:46:31.4667131Z #define _GLIBCXX_SYNCHRONIZATION_HAPPENS_AFTER(A) 2025-05-07T19:46:31.4667551Z #define _G_HAVE_ST_BLKSIZE defined (_STATBUF_ST_BLKSIZE) 2025-05-07T19:46:31.4667938Z #define __cpp_noexcept_function_type 201510L 2025-05-07T19:46:31.4668273Z #define M_PI 3.14159265358979323846 2025-05-07T19:46:31.4668594Z #define _GLIBCXX_PACKAGE_NAME "package-unused" 2025-05-07T19:46:31.4668950Z #define _GLIBCXX_HAVE_BUILTIN_IS_SAME 1 2025-05-07T19:46:31.4669269Z #define __GCC_ATOMIC_CHAR32_T_LOCK_FREE 2 2025-05-07T19:46:31.4669601Z #define _POSIX_DELAYTIMER_MAX 32 2025-05-07T19:46:31.4669898Z #define _GLIBCXX_USE_UTIME 1 2025-05-07T19:46:31.4670176Z #define _STL_ITERATOR_BASE_FUNCS_H 1 2025-05-07T19:46:31.4670834Z #define _IO_peekc_unlocked(_fp) (_IO_BE ((_fp)->_IO_read_ptr >= (_fp)->_IO_read_end, 0) && __underflow (_fp) == EOF ? EOF : *(unsigned char *) (_fp)->_IO_read_ptr) 2025-05-07T19:46:31.4671598Z #define _GLIBCXX_TR1_ELL_INTEGRAL_TCC 1 2025-05-07T19:46:31.4671952Z #define w_termsig __wait_terminated.__w_termsig 2025-05-07T19:46:31.4672286Z #define __FLOAT_WORD_ORDER __BYTE_ORDER 2025-05-07T19:46:31.4672603Z #define __cudaCDP2GetErrorName 2025-05-07T19:46:31.4672893Z #define XATTR_SIZE_MAX 65536 2025-05-07T19:46:31.4673161Z #define be64toh(x) __bswap_64 (x) 2025-05-07T19:46:31.4673487Z #define __ASSERT_VOID_CAST static_cast 2025-05-07T19:46:31.4673815Z #define __cpp_variadic_templates 200704L 2025-05-07T19:46:31.4674132Z #define RAND_MAX 2147483647 2025-05-07T19:46:31.4674399Z #define _GLIBCXX_USE_C99_COMPLEX_TR1 1 2025-05-07T19:46:31.4674741Z #define __UINT_FAST64_MAX__ 0xffffffffffffffffUL 2025-05-07T19:46:31.4675062Z #define __SM_90_RT_H__ 2025-05-07T19:46:31.4675320Z #define __SIG_ATOMIC_TYPE__ int 2025-05-07T19:46:31.4675583Z #define __COMPAR_FN_T 2025-05-07T19:46:31.4675839Z #define __GID_T_TYPE __U32_TYPE 2025-05-07T19:46:31.4676118Z #define _IO_BAD_SEEN 0x4000 2025-05-07T19:46:31.4676624Z #define _PSTL_PRAGMA_MESSAGE_IMPL(x) _PSTL_PRAGMA(message(_PSTL_STRING_CONCAT(_PSTL_PRAGMA_LOCATION, x))) 2025-05-07T19:46:31.4677173Z #define __DBL_MIN_10_EXP__ (-307) 2025-05-07T19:46:31.4677545Z #define __glibcxx_requires_sorted_pred(_First,_Last,_Pred) 2025-05-07T19:46:31.4677934Z #define __FINITE_MATH_ONLY__ 0 2025-05-07T19:46:31.4678241Z #define _PSTL_PRAGMA_SIMD_INCLUSIVE_SCAN(PRM) 2025-05-07T19:46:31.4678605Z #define cudaArrayColorAttachment 0x20 2025-05-07T19:46:31.4678940Z #define __cpp_variable_templates 201304L 2025-05-07T19:46:31.4679465Z #define cudaKernelNodeAttributeMemSyncDomainMap cudaLaunchAttributeMemSyncDomainMap 2025-05-07T19:46:31.4680047Z #define __cpp_lib_integral_constant_callable 201304 2025-05-07T19:46:31.4680391Z #define _GLIBCXX_HAVE_SINHF 1 2025-05-07T19:46:31.4680690Z #define MOD_TIMECONST ADJ_TIMECONST 2025-05-07T19:46:31.4680997Z #define __cpp_lib_result_of_sfinae 201210 2025-05-07T19:46:31.4681325Z #define __SM_30_INTRINSICS_H__ 2025-05-07T19:46:31.4681670Z #define __FLT32X_MAX_EXP__ 1024 2025-05-07T19:46:31.4681959Z #define _GLIBCXX_USE_WCHAR_T 1 2025-05-07T19:46:31.4682285Z #define _GLIBCXX_MATH_H 1 2025-05-07T19:46:31.4682553Z #define __u_char_defined 2025-05-07T19:46:31.4682894Z #define WIFEXITED(status) __WIFEXITED (__WAIT_INT (status)) 2025-05-07T19:46:31.4683269Z #define STA_PPSERROR 0x0800 2025-05-07T19:46:31.4683549Z #define _GLIBCXX_STD_A std 2025-05-07T19:46:31.4683805Z #define __FLT32_HAS_DENORM__ 1 2025-05-07T19:46:31.4684108Z #define _GLIBCXX_BEGIN_NAMESPACE_VERSION 2025-05-07T19:46:31.4684575Z #define __device_builtin_texture_type__ __location__(device_builtin_texture_type) 2025-05-07T19:46:31.4685039Z #define FP_INFINITE 1 2025-05-07T19:46:31.4685427Z #define _GLIBCXX11_DEPRECATED_SUGGEST(ALT) _GLIBCXX_DEPRECATED_SUGGEST(ALT) 2025-05-07T19:46:31.4685883Z #define _IO_pid_t __pid_t 2025-05-07T19:46:31.4686160Z #define __UINT_FAST8_MAX__ 0xff 2025-05-07T19:46:31.4686421Z #define __LEAF , __leaf__ 2025-05-07T19:46:31.4686680Z #define PATH_MAX 4096 2025-05-07T19:46:31.4686932Z #define __cpp_rvalue_reference 200610L 2025-05-07T19:46:31.4687290Z #define __LDBL_REDIR1(name,proto,alias) name proto 2025-05-07T19:46:31.4687618Z #define _LIMITS_H___ 2025-05-07T19:46:31.4687853Z #define __size_t 2025-05-07T19:46:31.4688080Z #define _GLIBCXX_HAVE_FREXPF 1 2025-05-07T19:46:31.4688664Z #define STA_RONLY (STA_PPSSIGNAL | STA_PPSJITTER | STA_PPSWANDER | STA_PPSERROR | STA_CLOCKERR | STA_NANO | STA_MODE | STA_CLK) 2025-05-07T19:46:31.4689279Z #define _GLIBCXX_HAVE_FREXPL 1 2025-05-07T19:46:31.4689589Z #define __cpp_nested_namespace_definitions 201411L 2025-05-07T19:46:31.4689941Z #define __DEC64_MAX_EXP__ 385 2025-05-07T19:46:31.4690203Z #define _WCHAR_T_DEFINED 2025-05-07T19:46:31.4690583Z #define __glibcxx_requires_can_decrement_range(_First1,_Last1,_First2) 2025-05-07T19:46:31.4690999Z #define MOD_STATUS ADJ_STATUS 2025-05-07T19:46:31.4691314Z #define _GLIBCXX_PURE __attribute__ ((__pure__)) 2025-05-07T19:46:31.4691650Z #define _GLIBCXX_HAVE_STDINT_H 1 2025-05-07T19:46:31.4691956Z #define __SIZEOF_PTHREAD_CONDATTR_T 4 2025-05-07T19:46:31.4692242Z #define __INT8_C(c) c 2025-05-07T19:46:31.4692513Z #define __cudaCDP2GetParameterBuffer 2025-05-07T19:46:31.4692826Z #define _GLIBCXX_HAVE_COSHF 1 2025-05-07T19:46:31.4693093Z #define _GLIBCXX_HAVE_COSHL 1 2025-05-07T19:46:31.4693368Z #define __SM_70_RT_HPP__ 2025-05-07T19:46:31.4693616Z #define __INT_LEAST8_WIDTH__ 8 2025-05-07T19:46:31.4693906Z #define __cpp_variadic_using 201611L 2025-05-07T19:46:31.4694233Z #define __UINT_LEAST64_MAX__ 0xffffffffffffffffUL 2025-05-07T19:46:31.4694581Z #define __INT_LEAST8_MAX__ 0x7f 2025-05-07T19:46:31.4694857Z #define __SM_61_INTRINSICS_HPP__ 2025-05-07T19:46:31.4695148Z #define _IO_FLAGS2_MMAP 1 2025-05-07T19:46:31.4695425Z #define __cpp_capture_star_this 201603L 2025-05-07T19:46:31.4695742Z #define __cudaCDP2LaunchDeviceV2_ptsz 2025-05-07T19:46:31.4696064Z #define _GLIBCXX_HAVE_ENDIAN_H 1 2025-05-07T19:46:31.4696435Z #define __always_inline __inline __attribute__ ((__always_inline__)) 2025-05-07T19:46:31.4696843Z #define NFDBITS __NFDBITS 2025-05-07T19:46:31.4697104Z #define _PSTL_PRAGMA_FORCEINLINE 2025-05-07T19:46:31.4697411Z #define _GLIBCXX_HAVE_SYS_STATVFS_H 1 2025-05-07T19:46:31.4697737Z #define __glibcxx_requires_sorted(_First,_Last) 2025-05-07T19:46:31.4698075Z #define __SHRT_MAX__ 0x7fff 2025-05-07T19:46:31.4698332Z #define _GLIBCXX_SYMVER_GNU 1 2025-05-07T19:46:31.4698634Z #define w_stopval __wait_stopped.__w_stopval 2025-05-07T19:46:31.4698956Z #define STA_UNSYNC 0x0040 2025-05-07T19:46:31.4699271Z #define __LDBL_MAX__ 1.18973149535723176502126385303097021e+4932L 2025-05-07T19:46:31.4699960Z #define _GLIBCXX_USE_C99_COMPLEX _GLIBCXX11_USE_C99_COMPLEX 2025-05-07T19:46:31.4701040Z #define __FLT64X_MAX_10_EXP__ 4932 2025-05-07T19:46:31.4701358Z #define __cpp_if_constexpr 201606L 2025-05-07T19:46:31.4701690Z #define __glibcxx_class_requires4(_a,_b,_c,_d,_e) 2025-05-07T19:46:31.4702054Z #define _GLIBCXX_HAVE_WCHAR_H 1 2025-05-07T19:46:31.4702556Z #define _GLIBCXX_USE_C99_STDIO _GLIBCXX11_USE_C99_STDIO 2025-05-07T19:46:31.4702927Z #define __daddr_t_defined 2025-05-07T19:46:31.4703295Z #define __LDBL_IS_IEC_60559__ 2 2025-05-07T19:46:31.4703580Z #define _GLIBCXX_TR1_RIEMANN_ZETA_TCC 1 2025-05-07T19:46:31.4703929Z #define _GLIBCXX_HAVE_STRUCT_DIRENT_D_TYPE 1 2025-05-07T19:46:31.4704479Z #define _PSTL_CPP11_STD_ROTATE_BROKEN ((__GLIBCXX__ && __GLIBCXX__ < 20150716) || (_MSC_VER && _MSC_VER < 1800)) 2025-05-07T19:46:31.4705027Z #define _ACRTIMP 2025-05-07T19:46:31.4705254Z #define _IO_EOF_SEEN 0x10 2025-05-07T19:46:31.4705543Z #define _GLIBCXX_TR1_POLY_LAGUERRE_TCC 1 2025-05-07T19:46:31.4705844Z #define _IOS_BIN 128 2025-05-07T19:46:31.4706229Z #define __fortify_function __extern_always_inline __attribute_artificial__ 2025-05-07T19:46:31.4706686Z #define __FLT64X_HAS_QUIET_NAN__ 1 2025-05-07T19:46:31.4706964Z #define UNDERFLOW 4 2025-05-07T19:46:31.4707205Z #define NAME_MAX 255 2025-05-07T19:46:31.4707445Z #define SCHAR_MAX __SCHAR_MAX__ 2025-05-07T19:46:31.4707743Z #define __UINT_LEAST8_MAX__ 0xff 2025-05-07T19:46:31.4708033Z #define __GCC_ATOMIC_BOOL_LOCK_FREE 2 2025-05-07T19:46:31.4708359Z #define _IO_UNIFIED_JUMPTABLES 1 2025-05-07T19:46:31.4708757Z #define __FLT128_DENORM_MIN__ 6.47517511943802511092443895822764655e-4966F128 2025-05-07T19:46:31.4709192Z #define __ptr_t void * 2025-05-07T19:46:31.4709435Z #define M_E 2.7182818284590452354 2025-05-07T19:46:31.4709734Z #define cudaSurfaceType1D 0x01 2025-05-07T19:46:31.4710027Z #define __USE_ISOCXX11 1 2025-05-07T19:46:31.4710301Z #define __UINTMAX_TYPE__ long unsigned int 2025-05-07T19:46:31.4710644Z #define cudaDeviceBlockingSync 0x04 2025-05-07T19:46:31.4710948Z #define CLOCK_MONOTONIC_COARSE 6 2025-05-07T19:46:31.4711248Z #define _GLIBCXX_OS_DEFINES 1 2025-05-07T19:46:31.4711545Z #define _GLIBCXX_NODISCARD [[__nodiscard__]] 2025-05-07T19:46:31.4711891Z #define cudaSurfaceType2D 0x02 2025-05-07T19:46:31.4712157Z #define __linux 1 2025-05-07T19:46:31.4712407Z #define __DEC32_EPSILON__ 1E-6DF 2025-05-07T19:46:31.4712698Z #define cudaDeviceMask 0xff 2025-05-07T19:46:31.4712992Z #define _GLIBCXX_END_NAMESPACE_ALGO 2025-05-07T19:46:31.4713318Z #define __CUDA_API_VER_MAJOR__ 12 2025-05-07T19:46:31.4713609Z #define htobe16(x) __bswap_16 (x) 2025-05-07T19:46:31.4713927Z #define HUGE_VALF (__builtin_huge_valf()) 2025-05-07T19:46:31.4714251Z #define __FLT_EVAL_METHOD_TS_18661_3__ 0 2025-05-07T19:46:31.4714592Z #define HUGE_VALL (__builtin_huge_vall()) 2025-05-07T19:46:31.4714901Z #define _BITS_TYPES_H 1 2025-05-07T19:46:31.4715217Z #define ULONG_LONG_MAX (LONG_LONG_MAX * 2ULL + 1ULL) 2025-05-07T19:46:31.4715578Z #define _IO_cleanup_region_end(_Doit) 2025-05-07T19:46:31.4715911Z #define cudaSurfaceType3D 0x03 2025-05-07T19:46:31.4716218Z #define _GLIBCXX_HAVE_SYS_TIME_H 1 2025-05-07T19:46:31.4716522Z #define __cudaGet_blockIdx() blockIdx 2025-05-07T19:46:31.4716841Z #define _IO_DONT_CLOSE 0100000 2025-05-07T19:46:31.4717706Z #define __MATHDECLX(type,function,suffix,args,attrib) __MATHDECL_1(type, function,suffix, args) __attribute__ (attrib); __MATHDECL_1(type, __CONCAT(__,function),suffix, args) __attribute__ (attrib) 2025-05-07T19:46:31.4718648Z #define cudaHostRegisterDefault 0x00 2025-05-07T19:46:31.4718946Z #define __unix 1 2025-05-07T19:46:31.4719190Z #define MATH_ERRNO 1 2025-05-07T19:46:31.4719442Z #define _GLIBCXX_STDIO_SEEK_END 2 2025-05-07T19:46:31.4719754Z #define _GLIBCXX_USE_FCHMODAT 1 2025-05-07T19:46:31.4720044Z #define __SM_100_RT_H__ 2025-05-07T19:46:31.4720304Z #define __UINT32_MAX__ 0xffffffffU 2025-05-07T19:46:31.4720623Z #define __GXX_EXPERIMENTAL_CXX0X__ 1 2025-05-07T19:46:31.4720927Z #define __UID_T_TYPE __U32_TYPE 2025-05-07T19:46:31.4721222Z #define _GLIBCXX20_DEPRECATED(MSG) 2025-05-07T19:46:31.4721537Z #define _GLIBCXX_HAVE_ATOMIC_LOCK_POLICY 1 2025-05-07T19:46:31.4722044Z #define __CUDART_API_VERSION ((__CUDA_API_VER_MAJOR__ * 1000) + (__CUDA_API_VER_MINOR__ * 10)) 2025-05-07T19:46:31.4722545Z #define __nv_pure__ __location__(nv_pure) 2025-05-07T19:46:31.4722868Z #define CUDARTAPI_CDECL 2025-05-07T19:46:31.4723225Z #define _PSTL_USAGE_WARNINGS 0 2025-05-07T19:46:31.4742448Z #define _GLIBCXX98_USE_C99_COMPLEX 1 2025-05-07T19:46:31.4742923Z #define __cpp_lib_void_t 201411 2025-05-07T19:46:31.4743234Z #define _POSIX_AIO_MAX 1 2025-05-07T19:46:31.4743498Z #define __SIZE_T 2025-05-07T19:46:31.4743760Z #define isgraph_l(c,l) __isgraph_l ((c), (l)) 2025-05-07T19:46:31.4744123Z #define _GLIBCXX_FULLY_DYNAMIC_STRING 0 2025-05-07T19:46:31.4744434Z #define _POSIX_PIPE_BUF 512 2025-05-07T19:46:31.4744725Z #define __CUDA_RUNTIME_API_H__ 2025-05-07T19:46:31.4745007Z #define _GLIBCXX_HAVE_STRTOLD 1 2025-05-07T19:46:31.4745296Z #define _ATFILE_SOURCE 1 2025-05-07T19:46:31.4745711Z #define __glibcxx_assert(cond) do { __glibcxx_constexpr_assert(cond); } while (false) 2025-05-07T19:46:31.4746195Z #define __WAIT_STATUS void * 2025-05-07T19:46:31.4746486Z #define __MATH_FUNCTIONS_H__ 2025-05-07T19:46:31.4746765Z #define _GLIBCXX_HAVE_WCSTOF 1 2025-05-07T19:46:31.4747061Z #define __FLT128_MIN_EXP__ (-16381) 2025-05-07T19:46:31.4747370Z #define _GLIBCXX_HAVE_LC_MESSAGES 1 2025-05-07T19:46:31.4747677Z #define __WINT_MIN__ 0U 2025-05-07T19:46:31.4748308Z #define _PSTL_CPP14_VARIABLE_TEMPLATES_PRESENT (!__INTEL_COMPILER || __INTEL_COMPILER >= 1700) && (_MSC_FULL_VER >= 190023918 || __cplusplus >= 201402L) 2025-05-07T19:46:31.4749035Z #define isdigit_l(c,l) __isdigit_l ((c), (l)) 2025-05-07T19:46:31.4749352Z #define WUNTRACED 2 2025-05-07T19:46:31.4749610Z #define _GLIBCXX_HAVE_SQRTF 1 2025-05-07T19:46:31.4749916Z #define __SIZEOF_PTHREAD_RWLOCKATTR_T 8 2025-05-07T19:46:31.4750214Z #define NZERO 20 2025-05-07T19:46:31.4750468Z #define _GLIBCXX_HAVE_MEMALIGN 1 2025-05-07T19:46:31.4750758Z #define _PSTL_PRAGMA(x) _Pragma(#x) 2025-05-07T19:46:31.4751077Z #define MOD_CLKA ADJ_OFFSET_SINGLESHOT 2025-05-07T19:46:31.4751378Z #define MOD_CLKB ADJ_TICK 2025-05-07T19:46:31.4751652Z #define __FLT128_MIN_10_EXP__ (-4931) 2025-05-07T19:46:31.4752050Z #define __FLT32X_IS_IEC_60559__ 2 2025-05-07T19:46:31.4752343Z #define __DEVICE_FUNCTIONS_H__ 2025-05-07T19:46:31.4752626Z #define SCHAR_MIN (-SCHAR_MAX - 1) 2025-05-07T19:46:31.4752921Z #define EXIT_FAILURE 1 2025-05-07T19:46:31.4753177Z #define ADJ_MAXERROR 0x0004 2025-05-07T19:46:31.4753436Z #define __INT_LEAST16_WIDTH__ 16 2025-05-07T19:46:31.4753719Z #define _SIZE_T_DEFINED_ 2025-05-07T19:46:31.4753982Z #define _POSIX_AIO_LISTIO_MAX 2 2025-05-07T19:46:31.4754428Z #define __cudaCDP2DeviceGetLimit 2025-05-07T19:46:31.4754790Z #define __LDBL_REDIR_NTH(name,proto) name proto __THROW 2025-05-07T19:46:31.4755198Z #define __cudaCDP2FuncGetAttributes 2025-05-07T19:46:31.4755515Z #define __SCHAR_MAX__ 0x7f 2025-05-07T19:46:31.4755821Z #define __FLT128_MANT_DIG__ 113 2025-05-07T19:46:31.4756111Z #define __USING_NAMESPACE_STD(name) 2025-05-07T19:46:31.4756461Z #define _GLIBCXX_HAVE_OBSOLETE_ISINF 1 2025-05-07T19:46:31.4756823Z #define __WCHAR_MIN__ (-__WCHAR_MAX__ - 1) 2025-05-07T19:46:31.4757135Z #define SEEK_DATA 3 2025-05-07T19:46:31.4757381Z #define __KERNEL_STRICT_NAMES 2025-05-07T19:46:31.4757683Z #define _IO_stderr ((_IO_FILE*)(&_IO_2_1_stderr_)) 2025-05-07T19:46:31.4758117Z #define _IO_ferror_unlocked(__fp) (((__fp)->_flags & _IO_ERR_SEEN) != 0) 2025-05-07T19:46:31.4758504Z #define _FUNCTEXCEPT_H 1 2025-05-07T19:46:31.4758763Z #define __INT64_C(c) c ## L 2025-05-07T19:46:31.4759022Z #define __NTH(fct) __LEAF_ATTR fct throw () 2025-05-07T19:46:31.4759363Z #define _GLIBCXX_CONST __attribute__ ((__const__)) 2025-05-07T19:46:31.4759694Z #define _GLIBCXX_HAVE_LINK 1 2025-05-07T19:46:31.4759975Z #define cudaNvSciSyncAttrWait 0x2 2025-05-07T19:46:31.4760322Z #define __GCC_ATOMIC_POINTER_LOCK_FREE 2 2025-05-07T19:46:31.4760642Z #define STA_PPSWANDER 0x0400 2025-05-07T19:46:31.4760952Z #define __INT_WCHAR_T_H 2025-05-07T19:46:31.4761212Z #define WSTOPPED 2 2025-05-07T19:46:31.4761500Z #define _POSIX_THREAD_THREADS_MAX 64 2025-05-07T19:46:31.4761814Z #define _POSIX_MQ_OPEN_MAX 8 2025-05-07T19:46:31.4762117Z #define FP_NORMAL 4 2025-05-07T19:46:31.4762378Z #define __cudaCDP2LaunchDevice_ptsz 2025-05-07T19:46:31.4762811Z #define _BITS_TIMEX_H 1 2025-05-07T19:46:31.4763081Z #define _POSIX_LINK_MAX 8 2025-05-07T19:46:31.4763449Z #define _GLIBCXX_HAVE_LIMIT_FSIZE 1 2025-05-07T19:46:31.4763785Z #define _GLIBCXX_HAVE_ATAN2F 1 2025-05-07T19:46:31.4764077Z #define cudaTextureType1D 0x01 2025-05-07T19:46:31.4764399Z #define _GLIBCXX_HAVE_ATAN2L 1 2025-05-07T19:46:31.4764688Z #define COLL_WEIGHTS_MAX 255 2025-05-07T19:46:31.4765011Z #define __isascii(c) (((c) & ~0x7f) == 0) 2025-05-07T19:46:31.4765327Z #define __toascii(c) ((c) & 0x7f) 2025-05-07T19:46:31.4765819Z #define __attribute_format_strfmon__(a,b) __attribute__ ((__format__ (__strfmon__, a, b))) 2025-05-07T19:46:31.4766295Z #define _IO_MAGIC 0xFBAD0000 2025-05-07T19:46:31.4766612Z #define _GLIBCXX_USE_SENDFILE 1 2025-05-07T19:46:31.4766932Z #define _POSIX_SOURCE 1 2025-05-07T19:46:31.4767198Z #define cudaTextureType2D 0x02 2025-05-07T19:46:31.4767511Z #define _PTR_TRAITS_H 1 2025-05-07T19:46:31.4767802Z #define _GLIBCXX_NOEXCEPT_QUAL noexcept (_NE) 2025-05-07T19:46:31.4768168Z #define _GLIBCXX_HAVE_POWF 1 2025-05-07T19:46:31.4768454Z #define _POSIX2_BC_STRING_MAX 1000 2025-05-07T19:46:31.4768828Z #define __attribute_used__ __attribute__ ((__used__)) 2025-05-07T19:46:31.4769187Z #define cudaTextureType3D 0x03 2025-05-07T19:46:31.4769509Z #define _STDIO_USES_IOSTREAM 2025-05-07T19:46:31.4769791Z #define CLOCK_REALTIME 0 2025-05-07T19:46:31.4770090Z #define __FLT32X_MANT_DIG__ 53 2025-05-07T19:46:31.4770417Z #define __GCC_ATOMIC_CHAR16_T_LOCK_FREE 2 2025-05-07T19:46:31.4770739Z #define __cpp_aligned_new 201606L 2025-05-07T19:46:31.4771077Z #define __USER_LABEL_PREFIX__ 2025-05-07T19:46:31.4771376Z #define cudaEventBlockingSync 0x01 2025-05-07T19:46:31.4771719Z #define _GLIBCXX_HAVE_TANL 1 2025-05-07T19:46:31.4772022Z #define _GLIBCXX_USE_PTHREAD_RWLOCK_T 1 2025-05-07T19:46:31.4772376Z #define _GLIBCXX_HAVE_LINUX_RANDOM_H 1 2025-05-07T19:46:31.4772694Z #define _GLIBCXX_USE_C99_FENV_TR1 1 2025-05-07T19:46:31.4773018Z #define __FLT32_MAX_10_EXP__ 38 2025-05-07T19:46:31.4773267Z #define __GLIBC__ 2 2025-05-07T19:46:31.4773492Z #define __END_DECLS } 2025-05-07T19:46:31.4773744Z #define FP_ILOGB0 (-2147483647 - 1) 2025-05-07T19:46:31.4774100Z #define __FLT64X_EPSILON__ 1.08420217248550443400745280086994171e-19F64x 2025-05-07T19:46:31.4774489Z #define __CONCAT(x,y) x ## y 2025-05-07T19:46:31.4774733Z #define WCONTINUED 8 2025-05-07T19:46:31.4774971Z #define __STDC_HOSTED__ 1 2025-05-07T19:46:31.4775218Z #define _GLIBCXX_HAVE_ARPA_INET_H 1 2025-05-07T19:46:31.4775499Z #define _ALLOCA_H 1 2025-05-07T19:46:31.4775722Z #define __host__ __location__(host) 2025-05-07T19:46:31.4776158Z #define __warndecl(name,msg) extern void name (void) __attribute__((__warning__ (msg))) 2025-05-07T19:46:31.4776918Z #define __SLONG32_TYPE int 2025-05-07T19:46:31.4777176Z #define _GLIBCXX_DEBUG_ASSERTIONS_H 1 2025-05-07T19:46:31.4777470Z #define _SYS_SELECT_H 1 2025-05-07T19:46:31.4777719Z #define _IO_LINE_BUF 0x200 2025-05-07T19:46:31.4777971Z #define _IOS_NOCREATE 32 2025-05-07T19:46:31.4778215Z #define __DEC64_MIN_EXP__ (-382) 2025-05-07T19:46:31.4778502Z #define __cudaGet_warpSize() warpSize 2025-05-07T19:46:31.4778790Z #define __SSIZE_T_TYPE __SWORD_TYPE 2025-05-07T19:46:31.4779083Z #define _GLIBCXX_HAVE_LIMIT_VMEM 0 2025-05-07T19:46:31.4779455Z #define __global__ __location__(global) 2025-05-07T19:46:31.4779924Z #define __GNU_LIBRARY__ 6 2025-05-07T19:46:31.4780208Z #define __cpp_decltype_auto 201304L 2025-05-07T19:46:31.4780495Z #define __DBL_DIG__ 15 2025-05-07T19:46:31.4780755Z #define TIME_UTC 1 2025-05-07T19:46:31.4780981Z #define __FLT32_DIG__ 6 2025-05-07T19:46:31.4781339Z #define __forceinline__ __inline__ __attribute__((always_inline)) 2025-05-07T19:46:31.4781763Z #define cudaHostAllocWriteCombined 0x04 2025-05-07T19:46:31.4782112Z #define cudaDeviceScheduleAuto 0x00 2025-05-07T19:46:31.4782434Z #define iscntrl_l(c,l) __iscntrl_l ((c), (l)) 2025-05-07T19:46:31.4782763Z #define _G_BUFSIZ 8192 2025-05-07T19:46:31.4783077Z #define __FLT_EPSILON__ 1.19209289550781250000000000000000000e-7F 2025-05-07T19:46:31.4783573Z #define cudaTextureTypeCubemap 0x0C 2025-05-07T19:46:31.4783977Z #define __cudaCDP2GetDevice 2025-05-07T19:46:31.4784268Z #define __cudaCDP2PeekAtLastError 2025-05-07T19:46:31.4784581Z #define STA_CLOCKERR 0x1000 2025-05-07T19:46:31.4784839Z #define __GXX_WEAK__ 1 2025-05-07T19:46:31.4785135Z #define __RLIM_T_TYPE __SYSCALL_ULONG_TYPE 2025-05-07T19:46:31.4785451Z #define _GLIBCXX_HAVE_ISNANF 1 2025-05-07T19:46:31.4785737Z #define __SHRT_WIDTH__ 16 2025-05-07T19:46:31.4786044Z #define __cpp_lib_robust_nonmodifying_seq_ops 201304 2025-05-07T19:46:31.4786417Z #define _GLIBCXX_BITS_SPECFUN_H 1 2025-05-07T19:46:31.4786726Z #define _GLIBCXX_HAVE_ISNANL 1 2025-05-07T19:46:31.4787024Z #define isblank_l(c,l) __isblank_l ((c), (l)) 2025-05-07T19:46:31.4787352Z #define _G_config_h 1 2025-05-07T19:46:31.4787639Z #define M_LOG2El 1.442695040888963407359924681001892137L 2025-05-07T19:46:31.4788011Z #define ADJ_OFFSET_SINGLESHOT 0x8001 2025-05-07T19:46:31.4788300Z #define _GCC_WCHAR_T 2025-05-07T19:46:31.4788557Z #define TMP_MAX 238328 2025-05-07T19:46:31.4788805Z #define __FLT32_IS_IEC_60559__ 2 2025-05-07T19:46:31.4789104Z #define __DEVICE_TYPES_H__ 2025-05-07T19:46:31.4789377Z #define __DEV_T_TYPE __UQUAD_TYPE 2025-05-07T19:46:31.4789678Z #define _EXT_NUMERIC_TRAITS 1 2025-05-07T19:46:31.4789976Z #define _GLIBCXX_BEGIN_NAMESPACE_ALGO 2025-05-07T19:46:31.4790271Z #define _IO_SKIPWS 01 2025-05-07T19:46:31.4790709Z #define cudaStreamGraphFireAndForgetAsSibling (cudaStream_t)0x0300000000000000 2025-05-07T19:46:31.4791204Z #define _IO_SCIENTIFIC 04000 2025-05-07T19:46:31.4791498Z #define _GLIBCXX_HAVE_STRING_H 1 2025-05-07T19:46:31.4791844Z #define __LDBL_MIN__ 3.36210314311209350626267781732175260e-4932L 2025-05-07T19:46:31.4792337Z #define cudaDeviceScheduleSpin 0x01 2025-05-07T19:46:31.4792708Z #define __nonnull(params) __attribute__ ((__nonnull__ params)) 2025-05-07T19:46:31.4793079Z #define __DBL_IS_IEC_60559__ 2 2025-05-07T19:46:31.4793340Z #define le32toh(x) (x) 2025-05-07T19:46:31.4793566Z #define _SIZE_T_DEFINED 2025-05-07T19:46:31.4793824Z #define _GLIBCXX_HAVE_XLOCALE_H 1 2025-05-07T19:46:31.4794153Z #define cudaArraySparsePropertiesSingleMipTail 0x1 2025-05-07T19:46:31.4794506Z #define __DEC32_MAX__ 9.999999E96DF 2025-05-07T19:46:31.4794900Z #define __WIFSIGNALED(status) (((signed char) (((status) & 0x7f) + 1) >> 1) > 0) 2025-05-07T19:46:31.4795328Z #define _GLIBCXX_HAVE_FMODL 1 2025-05-07T19:46:31.4795588Z #define _GLIBCXX_HAVE_POLL 1 2025-05-07T19:46:31.4795861Z #define __SM_32_INTRINSICS_H__ 2025-05-07T19:46:31.4796122Z #define _POSIX_NAME_MAX 14 2025-05-07T19:46:31.4796408Z #define __cpp_threadsafe_static_init 200806L 2025-05-07T19:46:31.4796959Z #define _GLIBCXX_MAKE_MOVE_IF_NOEXCEPT_ITERATOR(_Iter) std::__make_move_if_noexcept_iterator(_Iter) 2025-05-07T19:46:31.4797465Z #define _GLIBCXX_USE_CLOCK_REALTIME 1 2025-05-07T19:46:31.4797786Z #define __cpp_enumerator_attributes 201411L 2025-05-07T19:46:31.4798126Z #define __WCOREDUMP(status) ((status) & __WCOREFLAG) 2025-05-07T19:46:31.4798460Z #define _WCHAR_T_ 2025-05-07T19:46:31.4798681Z #define _GLIBCXX_FAST_MATH 0 2025-05-07T19:46:31.4799054Z #define __FLT64X_DENORM_MIN__ 3.64519953188247460252840593361941982e-4951F64x 2025-05-07T19:46:31.4799439Z #define RTSIG_MAX 32 2025-05-07T19:46:31.4799670Z #define _STDDEF_H 2025-05-07T19:46:31.4799908Z #define CU_UUID_HAS_BEEN_DEFINED 2025-05-07T19:46:31.4800173Z #define _VA_LIST_DEFINED 2025-05-07T19:46:31.4800587Z #define __FLT32X_HAS_INFINITY__ 1 2025-05-07T19:46:31.4801102Z #define __glibcxx_requires_non_empty_range(_First,_Last) 2025-05-07T19:46:31.4801602Z #define __grid_constant__ __location__(grid_constant) 2025-05-07T19:46:31.4801948Z #define __INT32_MAX__ 0x7fffffff 2025-05-07T19:46:31.4802269Z #define _GLIBCXX_BEGIN_EXTERN_C extern "C" { 2025-05-07T19:46:31.4802783Z #define _PSTL_CPP14_INTEGER_SEQUENCE_PRESENT (_MSC_VER >= 1900 || __cplusplus >= 201402L) 2025-05-07T19:46:31.4803354Z #define __glibcxx_digits_b(T,B) (B - __glibcxx_signed_b (T,B)) 2025-05-07T19:46:31.4806181Z #define __SIZEOF_PTHREAD_COND_T 48 2025-05-07T19:46:31.4806521Z #define _PSTL_PRAGMA_SIMD_ORDERED_MONOTONIC(PRM) 2025-05-07T19:46:31.4806966Z #define __unix__ 1 2025-05-07T19:46:31.4807215Z #define __SM_60_ATOMIC_FUNCTIONS_H__ 2025-05-07T19:46:31.4807535Z #define __INT_WIDTH__ 32 2025-05-07T19:46:31.4807789Z #define __SIZEOF_LONG__ 8 2025-05-07T19:46:31.4808045Z #define _IONBF 2 2025-05-07T19:46:31.4808538Z #define __MATHCALLX(function,suffix,args,attrib) __MATHDECLX (_Mdouble_,function,suffix, args, attrib) 2025-05-07T19:46:31.4809380Z #define _IO_getc_unlocked(_fp) (_IO_BE ((_fp)->_IO_read_ptr >= (_fp)->_IO_read_end, 0) ? __uflow (_fp) : *(unsigned char *) (_fp)->_IO_read_ptr++) 2025-05-07T19:46:31.4809980Z #define __STDC_IEC_559__ 1 2025-05-07T19:46:31.4810245Z #define __STDC_ISO_10646__ 201103L 2025-05-07T19:46:31.4810534Z #define __UINT16_C(c) c 2025-05-07T19:46:31.4810783Z #define M_2_PI 0.63661977236758134308 2025-05-07T19:46:31.4811077Z #define STA_DEL 0x0020 2025-05-07T19:46:31.4811331Z #define __CUDACC_VER_MINOR__ 8 2025-05-07T19:46:31.4811608Z #define __id_t_defined 2025-05-07T19:46:31.4811905Z #define w_retcode __wait_terminated.__w_retcode 2025-05-07T19:46:31.4812389Z #define _IO_PENDING_OUTPUT_COUNT(_fp) ((_fp)->_IO_write_ptr - (_fp)->_IO_write_base) 2025-05-07T19:46:31.4812864Z #define _GLIBCXX_HAVE_MODFF 1 2025-05-07T19:46:31.4813137Z #define _GLIBCXX_HAVE_MODFL 1 2025-05-07T19:46:31.4813525Z #define __DECIMAL_DIG__ 21 2025-05-07T19:46:31.4813768Z #define _POSIX2_RE_DUP_MAX 255 2025-05-07T19:46:31.4814035Z #define __USE_FORTIFY_LEVEL 0 2025-05-07T19:46:31.4814289Z #define __STDC_IEC_559_COMPLEX__ 1 2025-05-07T19:46:31.4814557Z #define SING 2 2025-05-07T19:46:31.4814764Z #define STA_FREQHOLD 0x0080 2025-05-07T19:46:31.4815038Z #define __SM_32_ATOMIC_FUNCTIONS_HPP__ 2025-05-07T19:46:31.4815342Z #define cudaStreamDefault 0x00 2025-05-07T19:46:31.4815677Z #define __FLT64_EPSILON__ 2.22044604925031308084726333618164062e-16F64 2025-05-07T19:46:31.4816243Z #define _GLIBCXX_HAVE_HYPOTL 1 2025-05-07T19:46:31.4816519Z #define _GLIBCXX_HAVE_SYS_UIO_H 1 2025-05-07T19:46:31.4816803Z #define __gnu_linux__ 1 2025-05-07T19:46:31.4817046Z #define __INT16_MAX__ 0x7fff 2025-05-07T19:46:31.4817317Z #define _LARGEFILE_SOURCE 1 2025-05-07T19:46:31.4817568Z #define MAX_INPUT 255 2025-05-07T19:46:31.4817822Z #define __FLT64_MIN_EXP__ (-1021) 2025-05-07T19:46:31.4818168Z #define __isalpha_l(c,l) __isctype_l((c), _ISalpha, (l)) 2025-05-07T19:46:31.4818552Z #define __glibcxx_requires_heap(_First,_Last) 2025-05-07T19:46:31.4818892Z #define _GLIBCXX_CPU_DEFINES 1 2025-05-07T19:46:31.4819162Z #define _GLIBCXX_HAVE_POLL_H 1 2025-05-07T19:46:31.4819666Z #define __attribute_warn_unused_result__ __attribute__ ((__warn_unused_result__)) 2025-05-07T19:46:31.4820289Z #define _IO_SHOWPOS 02000 2025-05-07T19:46:31.4820723Z #define _GLIBCXX_HAVE_SYMVER_SYMBOL_RENAMING_RUNTIME_SUPPORT 1 2025-05-07T19:46:31.4821107Z #define _Mfloat_ float 2025-05-07T19:46:31.4821398Z #define __glibcxx_requires_cond(_Cond,_Msg) 2025-05-07T19:46:31.4821745Z #define __FLT64X_MIN_10_EXP__ (-4931) 2025-05-07T19:46:31.4822044Z #define DELAYTIMER_MAX 2147483647 2025-05-07T19:46:31.4822400Z #define cudaMemPoolCreateUsageHwDecompress 0x2 2025-05-07T19:46:31.4822981Z #define __glibcxx_max_b(T,B) (__glibcxx_signed_b (T,B) ? (((((T)1 << (__glibcxx_digits_b (T,B) - 1)) - 1) << 1) + 1) : ~(T)0) 2025-05-07T19:46:31.4823534Z #define __LDBL_HAS_QUIET_NAN__ 1 2025-05-07T19:46:31.4823826Z #define _GLIBCXX98_USE_C99_STDIO 1 2025-05-07T19:46:31.4824184Z #define cudaKernelNodeAttrID cudaLaunchAttributeID 2025-05-07T19:46:31.4824567Z #define __glibcxx_class_requires2(_a,_b,_c) 2025-05-07T19:46:31.4824896Z #define __USE_ISOC11 1 2025-05-07T19:46:31.4825153Z #define _BSD_SIZE_T_ 2025-05-07T19:46:31.4825396Z #define ADJ_MICRO 0x1000 2025-05-07T19:46:31.4825673Z #define _GLIBCXX_HAVE_FABSF 1 2025-05-07T19:46:31.4825949Z #define _GLIBCXX_HAVE_FABSL 1 2025-05-07T19:46:31.4826279Z #define _PSTL_PRAGMA_SIMD _PSTL_PRAGMA(omp simd) 2025-05-07T19:46:31.4826617Z #define __FLT64_MANT_DIG__ 53 2025-05-07T19:46:31.4827049Z #define __attribute_const__ __attribute__ ((__const__)) 2025-05-07T19:46:31.4827459Z #define __THROW throw () 2025-05-07T19:46:31.4827740Z #define __cudaGet_gridDim() gridDim 2025-05-07T19:46:31.4828051Z #define __SM_60_ATOMIC_FUNCTIONS_HPP__ 2025-05-07T19:46:31.4828445Z #define __glibcxx_requires_heap_pred(_First,_Last,_Pred) 2025-05-07T19:46:31.4828840Z #define htobe32(x) __bswap_32 (x) 2025-05-07T19:46:31.4829136Z #define _GLIBCXX_HAVE_POWL 1 2025-05-07T19:46:31.4829426Z #define __FLT64X_MANT_DIG__ 64 2025-05-07T19:46:31.4829703Z #define __GLIBC_HAVE_LONG_LONG 1 2025-05-07T19:46:31.4829993Z #define L_tmpnam 20 2025-05-07T19:46:31.4830224Z #define ___int_wchar_t_h 2025-05-07T19:46:31.4830600Z #define WIFCONTINUED(status) __WIFCONTINUED (__WAIT_INT (status)) 2025-05-07T19:46:31.4831013Z #define isascii(c) __isascii (c) 2025-05-07T19:46:31.4831301Z #define _T_PTRDIFF 2025-05-07T19:46:31.4831616Z #define _GLIBCXX_MOVE3(_Tp,_Up,_Vp) std::move(_Tp, _Up, _Vp) 2025-05-07T19:46:31.4832123Z #define toascii(c) __toascii (c) 2025-05-07T19:46:31.4832402Z #define __GNUC__ 11 2025-05-07T19:46:31.4832660Z #define __SYSCALL_ULONG_TYPE __ULONGWORD_TYPE 2025-05-07T19:46:31.4832980Z #define __GXX_RTTI 1 2025-05-07T19:46:31.4833201Z #define __pie__ 2 2025-05-07T19:46:31.4833426Z #define __MMX__ 1 2025-05-07T19:46:31.4833648Z #define __cudaCDP2Malloc 2025-05-07T19:46:31.4833915Z #define __timespec_defined 1 2025-05-07T19:46:31.4834168Z #define L_ctermid 9 2025-05-07T19:46:31.4834413Z #define __OFF64_T_TYPE __SQUAD_TYPE 2025-05-07T19:46:31.4834721Z #define __cudaCDP2GetParameterBufferV2 2025-05-07T19:46:31.4835135Z #define offsetof(TYPE,MEMBER) __builtin_offsetof (TYPE, MEMBER) 2025-05-07T19:46:31.4835535Z #define _BITS_POSIX2_LIM_H 1 2025-05-07T19:46:31.4835804Z #define _GLIBCXX98_USE_C99_STDLIB 1 2025-05-07T19:46:31.4836113Z #define cudaMemAttachGlobal 0x01 2025-05-07T19:46:31.4836423Z #define FD_SET(fd,fdsetp) __FD_SET (fd, fdsetp) 2025-05-07T19:46:31.4836765Z #define __FLT_HAS_DENORM__ 1 2025-05-07T19:46:31.4837040Z #define __SIZEOF_LONG_DOUBLE__ 16 2025-05-07T19:46:31.4837520Z #define _GLIBCXX_NATIVE_THREAD_ID (__gthread_active_p() ? __gthread_self() : (__gthread_t)1) 2025-05-07T19:46:31.4838320Z #define assert_perror(errnum) (!(errnum) ? __ASSERT_VOID_CAST (0) : __assert_perror_fail ((errnum), __FILE__, __LINE__, __ASSERT_FUNCTION)) 2025-05-07T19:46:31.4838979Z #define _IO_HAVE_ST_BLKSIZE _G_HAVE_ST_BLKSIZE 2025-05-07T19:46:31.4839303Z #define __USE_SVID 1 2025-05-07T19:46:31.4839560Z #define __constant__ __location__(constant) 2025-05-07T19:46:31.4839991Z #define _GLIBCXX_HAVE_POSIX_MEMALIGN 1 2025-05-07T19:46:31.4840279Z #define __device__ __location__(device) 2025-05-07T19:46:31.4840608Z #define _GLIBCXX_HAVE_EXCEPTION_PTR_SINCE_GCC46 1 2025-05-07T19:46:31.4840927Z #define _GLIBCXX_RES_LIMITS 1 2025-05-07T19:46:31.4841197Z #define M_1_PI 0.31830988618379067154 2025-05-07T19:46:31.4841469Z #define CUDART_DEVICE __device__ 2025-05-07T19:46:31.4841823Z #define __LDBL_REDIR1_NTH(name,proto,alias) name proto __THROW 2025-05-07T19:46:31.4842201Z #define M_PI_2 1.57079632679489661923 2025-05-07T19:46:31.4842477Z #define __BIGGEST_ALIGNMENT__ 16 2025-05-07T19:46:31.4842844Z #define cudaExternalSemaphoreWaitSkipNvSciBufMemSync 0x02 2025-05-07T19:46:31.4843212Z #define __STDC_UTF_16__ 1 2025-05-07T19:46:31.4843463Z #define LONG_MAX __LONG_MAX__ 2025-05-07T19:46:31.4843820Z #define __glibcxx_digits10_b(T,B) (__glibcxx_digits_b (T,B) * 643L / 2136) 2025-05-07T19:46:31.4844256Z #define _POSIX_THREAD_DESTRUCTOR_ITERATIONS 4 2025-05-07T19:46:31.4844566Z #define _POSIX_HOST_NAME_MAX 255 2025-05-07T19:46:31.4844842Z #define __FLT64_MAX_10_EXP__ 308 2025-05-07T19:46:31.4845111Z #define NGROUPS_MAX 65536 2025-05-07T19:46:31.4845354Z #define _GLIBCXX_NAMESPACE_LDBL 2025-05-07T19:46:31.4845621Z #define __USE_ISOC95 1 2025-05-07T19:46:31.4845838Z #define _TIME_H 1 2025-05-07T19:46:31.4846107Z #define M_LOG10El 0.434294481903251827651128918916605082L 2025-05-07T19:46:31.4846420Z #define __USE_ISOC99 1 2025-05-07T19:46:31.4846823Z #define __ASMNAME(cname) __ASMNAME2 (__USER_LABEL_PREFIX__, cname) 2025-05-07T19:46:31.4847266Z #define HOST_NAME_MAX 64 2025-05-07T19:46:31.4847526Z #define _POSIX_SEM_NSEMS_MAX 256 2025-05-07T19:46:31.4847775Z #define _IOS_ATEND 4 2025-05-07T19:46:31.4848014Z #define __SM_35_INTRINSICS_H__ 2025-05-07T19:46:31.4848345Z #define WTERMSIG(status) __WTERMSIG (__WAIT_INT (status)) 2025-05-07T19:46:31.4848741Z #define cudaStreamAttrValue cudaLaunchAttributeValue 2025-05-07T19:46:31.4849096Z #define _GLIBCXX_HAVE_S_ISREG 1 2025-05-07T19:46:31.4849369Z #define cudaSurfaceTypeCubemap 0x0C 2025-05-07T19:46:31.4849692Z #define __cpp_delegating_constructors 200604L 2025-05-07T19:46:31.4849999Z #define __FLT32_HAS_INFINITY__ 1 2025-05-07T19:46:31.4850262Z #define _STDIO_H 1 2025-05-07T19:46:31.4850653Z #define __isctype_l(c,type,locale) ((locale)->__ctype_b[(int) (c)] & (unsigned short int) type) 2025-05-07T19:46:31.4851137Z #define _GLIBCXX_PREDEFINED_OPS_H 1 2025-05-07T19:46:31.4851503Z #define __DBL_MAX__ double(1.79769313486231570814527423731704357e+308L) 2025-05-07T19:46:31.4851878Z #define _G_IO_IO_FILE_VERSION 0x20001 2025-05-07T19:46:31.4852179Z #define _POSIX_SIGQUEUE_MAX 32 2025-05-07T19:46:31.4852438Z #define _GLIBCXX_HAVE_GETS 1 2025-05-07T19:46:31.4852714Z #define _GLIBCXX_HAVE_LINUX_TYPES_H 1 2025-05-07T19:46:31.4852996Z #define __cpp_raw_strings 200710L 2025-05-07T19:46:31.4853303Z #define __INT_FAST32_MAX__ 0x7fffffffffffffffL 2025-05-07T19:46:31.4853617Z #define _GLIBCXX_HAVE_VFWSCANF 1 2025-05-07T19:46:31.4853902Z #define __DBL_HAS_INFINITY__ 1 2025-05-07T19:46:31.4854173Z #define __STDCPP_MATH_SPEC_FUNCS__ 201003L 2025-05-07T19:46:31.4854481Z #define _GLIBCXX_STDIO_EOF -1 2025-05-07T19:46:31.4854758Z #define __SIZEOF_PTHREAD_MUTEX_T 40 2025-05-07T19:46:31.4855039Z #define __CHANNEL_DESCRIPTOR_H__ 2025-05-07T19:46:31.4855398Z #define _ISbit(bit) ((bit) < 8 ? ((1 << (bit)) << 8) : ((1 << (bit)) >> 8)) 2025-05-07T19:46:31.4855762Z #define __SIZEOF_FLOAT__ 4 2025-05-07T19:46:31.4856015Z #define __USE_XOPEN 1 2025-05-07T19:46:31.4856247Z #define __SIZEOF_PTHREAD_RWLOCK_T 56 2025-05-07T19:46:31.4856697Z #define cudaStreamAttributeMemSyncDomain cudaLaunchAttributeMemSyncDomain 2025-05-07T19:46:31.4857137Z #define __USE_XOPEN2K 1 2025-05-07T19:46:31.4857382Z #define _PSTL_UDR_PRESENT 1 2025-05-07T19:46:31.4857641Z #define __HAVE_SPECULATION_SAFE_VALUE 1 2025-05-07T19:46:31.4857938Z #define _GLIBCXX_HAVE_COSF 1 2025-05-07T19:46:31.4858218Z #define __cpp_fold_expressions 201603L 2025-05-07T19:46:31.4858733Z #define cudaWaitExternalSemaphoresAsync __CUDART_API_PTSZ(cudaWaitExternalSemaphoresAsync_v2) 2025-05-07T19:46:31.4859273Z #define NL_LANGMAX _POSIX2_LINE_MAX 2025-05-07T19:46:31.4859644Z #define __DEC32_MIN_EXP__ (-94) 2025-05-07T19:46:31.4860204Z #define __glibcxx_requires_partitioned_upper(_First,_Last,_Value) 2025-05-07T19:46:31.4860694Z #define __DADDR_T_TYPE __S32_TYPE 2025-05-07T19:46:31.4861108Z #define cudaExternalSemaphoreSignalSkipNvSciBufMemSync 0x01 2025-05-07T19:46:31.4861541Z #define __END_NAMESPACE_C99 2025-05-07T19:46:31.4861839Z #define __glibcxx_integral_traps true 2025-05-07T19:46:31.4862142Z #define _POSIX_PATH_MAX 256 2025-05-07T19:46:31.4862412Z #define __INTPTR_WIDTH__ 64 2025-05-07T19:46:31.4862684Z #define __FLT64X_HAS_INFINITY__ 1 2025-05-07T19:46:31.4862955Z #define _IOS_TRUNC 16 2025-05-07T19:46:31.4863196Z #define _ISOC11_SOURCE 1 2025-05-07T19:46:31.4863454Z #define _GLIBCXX_HAVE_LINUX_FUTEX 1 2025-05-07T19:46:31.4863768Z #define __UINT_LEAST32_MAX__ 0xffffffffU 2025-05-07T19:46:31.4864082Z #define _GLIBCXX_HAVE_QUICK_EXIT 1 2025-05-07T19:46:31.4864477Z #define __glibcxx_requires_irreflexive_pred2(_First,_Last,_Pred) 2025-05-07T19:46:31.4864886Z #define LONG_MIN (-LONG_MAX - 1L) 2025-05-07T19:46:31.4865180Z #define _GLIBCXX_HAVE_SINCOSF 1 2025-05-07T19:46:31.4865461Z #define _IO_UNITBUF 020000 2025-05-07T19:46:31.4865717Z #define _GLIBCXX_HAVE_SINCOSL 1 2025-05-07T19:46:31.4865993Z #define __FD_SETSIZE 1024 2025-05-07T19:46:31.4866250Z #define getc(_fp) _IO_getc (_fp) 2025-05-07T19:46:31.4866620Z #define be32toh(x) __bswap_32 (x) 2025-05-07T19:46:31.4867053Z #define _GLIBCXX_PACKAGE__GLIBCXX_VERSION "version-unused" 2025-05-07T19:46:31.4867435Z #define __FLT32X_HAS_DENORM__ 1 2025-05-07T19:46:31.4867711Z #define __INT_FAST16_TYPE__ long int 2025-05-07T19:46:31.4868037Z #define isxdigit_l(c,l) __isxdigit_l ((c), (l)) 2025-05-07T19:46:31.4868368Z #define _GLIBCXX_HAVE_GETIPINFO 1 2025-05-07T19:46:31.4868659Z #define __MMX_WITH_SSE__ 1 2025-05-07T19:46:31.4868980Z #define __isalnum_l(c,l) __isctype_l((c), _ISalnum, (l)) 2025-05-07T19:46:31.4869332Z #define _WCHAR_T_DEFINED_ 2025-05-07T19:46:31.4869631Z #define cudaIpcMemLazyEnablePeerAccess 0x01 2025-05-07T19:46:31.4869968Z #define _GLIBCXX_HAVE_AT_QUICK_EXIT 1 2025-05-07T19:46:31.4870275Z #define __INO_T_MATCHES_INO64_T 1 2025-05-07T19:46:31.4870550Z #define __USE_POSIX199506 1 2025-05-07T19:46:31.4870806Z #define _FEATURES_H 1 2025-05-07T19:46:31.4871046Z #define __LDBL_HAS_DENORM__ 1 2025-05-07T19:46:31.4871470Z #define _PSTL_PRAGMA_SIMD_REDUCTION(PRM) _PSTL_PRAGMA(omp simd reduction(PRM)) 2025-05-07T19:46:31.4872096Z #define __WEXITSTATUS(status) (((status) & 0xff00) >> 8) 2025-05-07T19:46:31.4872411Z #define __stub_getmsg 2025-05-07T19:46:31.4872635Z #define _IO_FIXED 010000 2025-05-07T19:46:31.4872884Z #define __cpp_lib_addressof_constexpr 201603 2025-05-07T19:46:31.4873189Z #define _GLIBCXX11_USE_C99_STDIO 1 2025-05-07T19:46:31.4873444Z #define __stub_setlogin 2025-05-07T19:46:31.4873678Z #define __stub_fattach 2025-05-07T19:46:31.4873902Z #define __cplusplus 201703L 2025-05-07T19:46:31.4874158Z #define __cpp_ref_qualifiers 200710L 2025-05-07T19:46:31.4874427Z #define _STRUCT_TIMEVAL 1 2025-05-07T19:46:31.4874679Z #define INFINITY (__builtin_inff()) 2025-05-07T19:46:31.4874947Z #define _IO_UNBUFFERED 2 2025-05-07T19:46:31.4875413Z #define cudaStreamAttributeSynchronizationPolicy cudaLaunchAttributeSynchronizationPolicy 2025-05-07T19:46:31.4875941Z #define _IO_INTERNAL 010 2025-05-07T19:46:31.4876170Z #define __DEC32_MIN__ 1E-95DF 2025-05-07T19:46:31.4876504Z #define cudaKernelNodeAttrValue cudaLaunchAttributeValue 2025-05-07T19:46:31.4876845Z #define __dev_t_defined 2025-05-07T19:46:31.4877074Z #define __DEPRECATED 1 2025-05-07T19:46:31.4877289Z #define __S32_TYPE int 2025-05-07T19:46:31.4877533Z #define __cpp_rvalue_references 200610L 2025-05-07T19:46:31.4877813Z #define __DBL_MAX_EXP__ 1024 2025-05-07T19:46:31.4878068Z #define _IO_fpos_t _G_fpos_t 2025-05-07T19:46:31.4878314Z #define __WCHAR_WIDTH__ 32 2025-05-07T19:46:31.4878902Z #define cudaKernelNodeAttributePreferredSharedMemoryCarveout cudaLaunchAttributePreferredSharedMemoryCarveout 2025-05-07T19:46:31.4879539Z #define _G_HAVE_MREMAP 1 2025-05-07T19:46:31.4879834Z #define __FLT32_MAX__ 3.40282346638528859811704183484516925e+38F32 2025-05-07T19:46:31.4880171Z #define OVERFLOW 3 2025-05-07T19:46:31.4880403Z #define __toascii_l(c,l) ((l), __toascii (c)) 2025-05-07T19:46:31.4880706Z #define __DEC128_EPSILON__ 1E-33DL 2025-05-07T19:46:31.4880973Z #define __SM_32_ATOMIC_FUNCTIONS_H__ 2025-05-07T19:46:31.4881307Z #define _GLIBCXX_DEFAULT_ABI_TAG _GLIBCXX_ABI_TAG_CXX11 2025-05-07T19:46:31.4881636Z #define __SSE2_MATH__ 1 2025-05-07T19:46:31.4881866Z #define __ATOMIC_HLE_RELEASE 131072 2025-05-07T19:46:31.4882168Z #define __FSFILCNT_T_TYPE __SYSCALL_ULONG_TYPE 2025-05-07T19:46:31.4882456Z #define _IO_STDIO_H 2025-05-07T19:46:31.4882694Z #define PDP_ENDIAN __PDP_ENDIAN 2025-05-07T19:46:31.4882969Z #define isspace_l(c,l) __isspace_l ((c), (l)) 2025-05-07T19:46:31.4883278Z #define __cudaCDP2Memcpy2DAsync 2025-05-07T19:46:31.4883558Z #define __PTRDIFF_MAX__ 0x7fffffffffffffffL 2025-05-07T19:46:31.4883865Z #define _GLIBCXX_HAVE_STRERROR_R 1 2025-05-07T19:46:31.4884111Z #define __amd64 1 2025-05-07T19:46:31.4884330Z #define _POSIX_TZNAME_MAX 6 2025-05-07T19:46:31.4884586Z #define __cudaCDP2Memset3DAsync 2025-05-07T19:46:31.4884848Z #define __SYSCALL_WORDSIZE 64 2025-05-07T19:46:31.4885126Z #define _GLIBCXX_HAVE_ATTRIBUTE_VISIBILITY 1 2025-05-07T19:46:31.4885411Z #define _EXT_TYPE_TRAITS 1 2025-05-07T19:46:31.4885745Z #define _GLIBCXX_HAVE_POSIX_SEMAPHORE 1 2025-05-07T19:46:31.4886084Z #define _POSIX_RE_DUP_MAX 255 2025-05-07T19:46:31.4886341Z #define __STDC_NO_THREADS__ 1 2025-05-07T19:46:31.4886572Z #define __bounded 2025-05-07T19:46:31.4886788Z #define _GLIBCXX_HAVE_ACOSL 1 2025-05-07T19:46:31.4887040Z #define __USECONDS_T_TYPE __U32_TYPE 2025-05-07T19:46:31.4887326Z #define _IO_DELETE_DONT_CLOSE 0x40 2025-05-07T19:46:31.4887603Z #define __BEGIN_NAMESPACE_STD 2025-05-07T19:46:31.4887856Z #define _PTRDIFF_T_DECLARED 2025-05-07T19:46:31.4888127Z #define __OFF_T_TYPE __SYSCALL_SLONG_TYPE 2025-05-07T19:46:31.4888432Z #define __W_STOPCODE(sig) ((sig) << 8 | 0x7f) 2025-05-07T19:46:31.4888837Z #define cudaStreamAttributePriority cudaLaunchAttributePriority 2025-05-07T19:46:31.4889230Z #define _GLIBCXX_HAVE_NETDB_H 1 2025-05-07T19:46:31.4889494Z #define __SM_20_INTRINSICS_HPP__ 2025-05-07T19:46:31.4889817Z #define __cpp_lib_has_unique_object_representations 201606 2025-05-07T19:46:31.4890167Z #define STA_PLL 0x0001 2025-05-07T19:46:31.4890402Z #define __ATOMIC_HLE_ACQUIRE 65536 2025-05-07T19:46:31.4890661Z #define __GNUG__ 11 2025-05-07T19:46:31.4890888Z #define _GLIBCXX_USE_GET_NPROCS 1 2025-05-07T19:46:31.4891135Z #define _T_WCHAR 2025-05-07T19:46:31.4891356Z #define __cudaCDP2GetDeviceCount 2025-05-07T19:46:31.4891628Z #define __specialization_static 2025-05-07T19:46:31.4891921Z #define __LONG_LONG_MAX__ 0x7fffffffffffffffLL 2025-05-07T19:46:31.4892221Z #define __SIZEOF_SIZE_T__ 8 2025-05-07T19:46:31.4892475Z #define cudaArraySparse 0x40 2025-05-07T19:46:31.4892720Z #define STA_PPSFREQ 0x0002 2025-05-07T19:46:31.4892995Z #define _IO_stdin ((_IO_FILE*)(&_IO_2_1_stdin_)) 2025-05-07T19:46:31.4893291Z #define _WCHAR_T 2025-05-07T19:46:31.4893496Z #define __cudaCDP2Free 2025-05-07T19:46:31.4894151Z #define __FD_ZERO(fdsp) do { int __d0, __d1; __asm__ __volatile__ ("cld; rep; " __FD_ZERO_STOS : "=c" (__d0), "=D" (__d1) : "a" (0), "0" (sizeof (fd_set) / sizeof (__fd_mask)), "1" (&__FDS_BITS (fdsp)[0]) : "memory"); } while (0) 2025-05-07T19:46:31.4894856Z #define __cpp_nsdmi 200809L 2025-05-07T19:46:31.4895277Z #define __glibcxx_min_b(T,B) (__glibcxx_signed_b (T,B) ? -__glibcxx_max_b (T,B) - 1 : (T)0) 2025-05-07T19:46:31.4895719Z #define __FLT64X_MIN_EXP__ (-16381) 2025-05-07T19:46:31.4895991Z #define __SIZEOF_WINT_T__ 4 2025-05-07T19:46:31.4896251Z #define cudaArrayCubemap 0x04 2025-05-07T19:46:31.4896575Z #define _PSTL_MONOTONIC_PRESENT (__INTEL_COMPILER >= 1800) 2025-05-07T19:46:31.4896926Z #define _GLIBCXX_UTILITY 1 2025-05-07T19:46:31.4897155Z #define __NO_CTYPE 1 2025-05-07T19:46:31.4897382Z #define __stub_bdflush 2025-05-07T19:46:31.4897728Z #define _GLIBCXX_MAKE_MOVE_ITERATOR(_Iter) std::make_move_iterator(_Iter) 2025-05-07T19:46:31.4898145Z #define __CORRECT_ISO_CPP_STRING_H_PROTO 2025-05-07T19:46:31.4898435Z #define _GLIBCXX_STDC_HEADERS 1 2025-05-07T19:46:31.4898701Z #define __LONG_LONG_WIDTH__ 64 2025-05-07T19:46:31.4898962Z #define __cpp_initializer_lists 200806L 2025-05-07T19:46:31.4899268Z #define _GLIBCXX_HAVE_NETINET_TCP_H 1 2025-05-07T19:46:31.4899639Z #define __U16_TYPE unsigned short int 2025-05-07T19:46:31.4900157Z #define __glibcxx_requires_can_increment(_First,_Size) 2025-05-07T19:46:31.4900761Z #define _GLIBCXX_HAVE_SYS_PARAM_H 1 2025-05-07T19:46:31.4901049Z #define __FLT32_MAX_EXP__ 128 2025-05-07T19:46:31.4901346Z #define cudaHostRegisterIoMemory 0x04 2025-05-07T19:46:31.4901700Z #define __FD_MASK(d) ((__fd_mask) 1 << ((d) % __NFDBITS)) 2025-05-07T19:46:31.4902067Z #define __cpp_lib_is_invocable 201703 2025-05-07T19:46:31.4902351Z #define _IO_STDIO 040000 2025-05-07T19:46:31.4902692Z #define _SIGSET_NWORDS (1024 / (8 * sizeof (unsigned long int))) 2025-05-07T19:46:31.4903102Z #define cudaSurfaceType1DLayered 0xF1 2025-05-07T19:46:31.4903429Z #define cudaArraySurfaceLoadStore 0x02 2025-05-07T19:46:31.4903743Z #define _PTRDIFF_T 2025-05-07T19:46:31.4903967Z #define _MOVE_H 1 2025-05-07T19:46:31.4904214Z #define __cpp_hex_float 201603L 2025-05-07T19:46:31.4904601Z #define ADJ_TAI 0x0080 2025-05-07T19:46:31.4904852Z #define __ptrvalue 2025-05-07T19:46:31.4905081Z #define _GLIBCXX_HOSTED 1 2025-05-07T19:46:31.4905448Z #define __GXX_ABI_VERSION 1016 2025-05-07T19:46:31.4905748Z #define __WTERMSIG(status) ((status) & 0x7f) 2025-05-07T19:46:31.4906079Z #define MATH_ERREXCEPT 2 2025-05-07T19:46:31.4906354Z #define _GLIBCXX_HAS_GTHREADS 1 2025-05-07T19:46:31.4906648Z #define cudaTextureType2DLayered 0xF2 2025-05-07T19:46:31.4907082Z #define __isleap(year) ((year) % 4 == 0 && ((year) % 100 != 0 || (year) % 400 == 0)) 2025-05-07T19:46:31.4907490Z #define __USE_GNU 1 2025-05-07T19:46:31.4907743Z #define __FLT128_HAS_INFINITY__ 1 2025-05-07T19:46:31.4908033Z #define __FLT_MIN_EXP__ (-125) 2025-05-07T19:46:31.4908329Z #define __GCC_HAVE_DWARF2_CFI_ASM 1 2025-05-07T19:46:31.4908737Z #define __FD_CLR(d,set) ((void) (__FDS_BITS (set)[__FD_ELT (d)] &= ~__FD_MASK (d))) 2025-05-07T19:46:31.4909168Z #define WEXITED 4 2025-05-07T19:46:31.4909394Z #define _IO_NO_READS 4 2025-05-07T19:46:31.4909722Z #define cudaGraphKernelNodePortLaunchCompletion 2 2025-05-07T19:46:31.4910098Z #define M_LOG2E 1.4426950408889634074 2025-05-07T19:46:31.4910394Z #define _POSIX_SYMLINK_MAX 255 2025-05-07T19:46:31.4910721Z #define _GLIBCXX_HAVE_BUILTIN_HAS_UNIQ_OBJ_REP 1 2025-05-07T19:46:31.4911050Z #define __uid_t_defined 2025-05-07T19:46:31.4911319Z #define __FD_ELT(d) ((d) / __NFDBITS) 2025-05-07T19:46:31.4911621Z #define _GLIBCXX_USE_STD_SPEC_FUNCS 1 2025-05-07T19:46:31.4911920Z #define WNOHANG 1 2025-05-07T19:46:31.4912176Z #define alloca(size) __builtin_alloca (size) 2025-05-07T19:46:31.4912511Z #define _GLIBCXX_HAVE_HYPOTF 1 2025-05-07T19:46:31.4912948Z #define cudaEventDefault 0x00 2025-05-07T19:46:31.4913353Z #define __maxnreg__(a) __attribute__((maxnreg(a))) 2025-05-07T19:46:31.4913681Z #define NL_SETMAX INT_MAX 2025-05-07T19:46:31.4913907Z #define __x86_64 1 2025-05-07T19:46:31.4914143Z #define __cudaCDP2LaunchDevice 2025-05-07T19:46:31.4914531Z #define __REDIRECT(name,proto,alias) name proto __asm__ (__ASMNAME (#alias)) 2025-05-07T19:46:31.4915025Z #define _GLIBCXX_BEGIN_NAMESPACE_CXX11 namespace __cxx11 { 2025-05-07T19:46:31.4915528Z #define __extern_always_inline extern __always_inline __attribute__ ((__gnu_inline__)) 2025-05-07T19:46:31.4915973Z #define __PTRDIFF_T 2025-05-07T19:46:31.4916305Z #define __exctype_l(name) extern int name (int, __locale_t) __THROW 2025-05-07T19:46:31.4916676Z #define _GLIBCXX_HAVE_FINITEL 1 2025-05-07T19:46:31.4916957Z #define __SM_35_ATOMIC_FUNCTIONS_H__ 2025-05-07T19:46:31.4917239Z #define _Mlong_double_ long double 2025-05-07T19:46:31.4917525Z #define __cpp_lambdas 200907L 2025-05-07T19:46:31.4917768Z #define _IO_DEC 020 2025-05-07T19:46:31.4918000Z #define _GLIBCXX_HAVE_SINHL 1 2025-05-07T19:46:31.4918260Z #define _POSIX_CLOCKRES_MIN 20000000 2025-05-07T19:46:31.4918555Z #define __INT_FAST64_TYPE__ long int 2025-05-07T19:46:31.4918827Z #define ADJ_TIMECONST 0x0020 2025-05-07T19:46:31.4919097Z #define _GLIBCXX_HAVE_SQRTL 1 2025-05-07T19:46:31.4919397Z #define __cudaCDP2DeviceGetSharedMemConfig 2025-05-07T19:46:31.4919715Z #define _GLIBCXX_HAVE_STDALIGN_H 1 2025-05-07T19:46:31.4920167Z #define _ANSI_STDDEF_H 2025-05-07T19:46:31.4920488Z #define _GLIBCXX_MOVE(__val) std::move(__val) 2025-05-07T19:46:31.4920824Z #define _GLIBCXX_HAVE_STRERROR_L 1 2025-05-07T19:46:31.4921202Z #define __FLT64_DENORM_MIN__ 4.94065645841246544176568792868221372e-324F64 2025-05-07T19:46:31.4921622Z #define _GLIBCXX_USE_DEV_RANDOM 1 2025-05-07T19:46:31.4921916Z #define _STL_ITERATOR_BASE_TYPES_H 1 2025-05-07T19:46:31.4922238Z #define __cpp_template_auto 201606L 2025-05-07T19:46:31.4922620Z #define __DBL_MIN__ double(2.22507385850720138309023271733240406e-308L) 2025-05-07T19:46:31.4923005Z #define _GLIBCXX_HAVE_SYS_SEM_H 1 2025-05-07T19:46:31.4923292Z #define __key_t_defined 2025-05-07T19:46:31.4923543Z #define _IO_MAGIC_MASK 0xFFFF0000 2025-05-07T19:46:31.4923937Z #define __cluster_dims__(...) __attribute__((cluster_dims(__VA_ARGS__))) 2025-05-07T19:46:31.4924426Z #define __FLT128_EPSILON__ 1.92592994438723585305597794258492732e-34F128 2025-05-07T19:46:31.4924904Z #define __GNUC_VA_LIST 2025-05-07T19:46:31.4925305Z #define __FLT64X_NORM_MAX__ 1.18973149535723176502126385303097021e+4932F64x 2025-05-07T19:46:31.4925742Z #define __SIZEOF_POINTER__ 8 2025-05-07T19:46:31.4926028Z #define CLOCK_REALTIME_COARSE 5 2025-05-07T19:46:31.4926317Z #define _GLIBCXX14_CONSTEXPR constexpr 2025-05-07T19:46:31.4926639Z #define __USE_XOPEN2KXSI 1 2025-05-07T19:46:31.4926898Z #define __WCOREFLAG 0x80 2025-05-07T19:46:31.4927173Z #define M_2_SQRTPI 1.12837916709551257390 2025-05-07T19:46:31.4927486Z #define cudaEventDisableTiming 0x02 2025-05-07T19:46:31.4927787Z #define __LP64__ 1 2025-05-07T19:46:31.4928040Z #define __isascii_l(c,l) ((l), __isascii (c)) 2025-05-07T19:46:31.4928385Z #define cudaStreamNonBlocking 0x01 2025-05-07T19:46:31.4928679Z #define _IO_off64_t __off64_t 2025-05-07T19:46:31.4928963Z #define __DBL_HAS_QUIET_NAN__ 1 2025-05-07T19:46:31.4929246Z #define __time_t_defined 1 2025-05-07T19:46:31.4929509Z #define _POSIX_SYMLOOP_MAX 8 2025-05-07T19:46:31.4929888Z #define __FLT32X_EPSILON__ 2.22044604925031308084726333618164062e-16F32x 2025-05-07T19:46:31.4930275Z #define __USE_UNIX98 1 2025-05-07T19:46:31.4930543Z #define __MODE_T_TYPE __U32_TYPE 2025-05-07T19:46:31.4930825Z #define CLOCK_REALTIME_ALARM 8 2025-05-07T19:46:31.4931099Z #define _GLIBCXX_HAVE_STRINGS_H 1 2025-05-07T19:46:31.4931390Z #define __LEAF_ATTR __attribute__ ((__leaf__)) 2025-05-07T19:46:31.4931709Z #define __DECIMAL_BID_FORMAT__ 1 2025-05-07T19:46:31.4932074Z #define SEEK_CUR 1 2025-05-07T19:46:31.4932296Z #define __RLIM64_T_TYPE __UQUAD_TYPE 2025-05-07T19:46:31.4932566Z #define _ASSERT_H 1 2025-05-07T19:46:31.4933138Z #define _PSTL_PRAGMA_DECLARE_REDUCTION(NAME,OP) _PSTL_PRAGMA(omp declare reduction(NAME:OP : omp_out(omp_in)) initializer(omp_priv = omp_orig)) 2025-05-07T19:46:31.4933793Z #define _GLIBCXX_USE_DEPRECATED 1 2025-05-07T19:46:31.4934058Z #define CHAR_MAX SCHAR_MAX 2025-05-07T19:46:31.4934315Z #define _GLIBCXX_HAVE_SETENV 1 2025-05-07T19:46:31.4934575Z #define NL_ARGMAX _POSIX_ARG_MAX 2025-05-07T19:46:31.4934850Z #define _GLIBCXX_USE_UTIMENSAT 1 2025-05-07T19:46:31.4935216Z #define __extern_inline extern __inline __attribute__ ((__gnu_inline__)) 2025-05-07T19:46:31.4935635Z #define _GLIBCXX_DEBUG_ONLY(_Statement) 2025-05-07T19:46:31.4936308Z #define _IO_putc_unlocked(_ch,_fp) (_IO_BE ((_fp)->_IO_write_ptr >= (_fp)->_IO_write_end, 0) ? __overflow (_fp, (unsigned char) (_ch)) : (unsigned char) (*(_fp)->_IO_write_ptr++ = (_ch))) 2025-05-07T19:46:31.4936974Z #define _GLIBCXX_HAVE_BUILTIN_LAUNDER 1 2025-05-07T19:46:31.4937266Z #define _IO_BOOLALPHA 0200000 2025-05-07T19:46:31.4937607Z #define _PSTL_CPP17_EXECUTION_POLICIES_PRESENT (_MSC_VER >= 1912) 2025-05-07T19:46:31.4937992Z #define _GLIBCXX_PACKAGE_URL "" 2025-05-07T19:46:31.4938251Z #define __FLT64_MIN_10_EXP__ (-307) 2025-05-07T19:46:31.4938533Z #define cudaArrayDefault 0x00 2025-05-07T19:46:31.4938808Z #define __cudaCDP2LaunchDeviceV2 2025-05-07T19:46:31.4939089Z #define __FDS_BITS(set) ((set)->fds_bits) 2025-05-07T19:46:31.4939437Z #define TLOSS 5 2025-05-07T19:46:31.4939640Z #define __ssize_t_defined 2025-05-07T19:46:31.4940066Z #define __CUDACC_VER_BUILD__ 61 2025-05-07T19:46:31.4940345Z #define ULONG_MAX (LONG_MAX * 2UL + 1UL) 2025-05-07T19:46:31.4940732Z #define __FLT64X_DECIMAL_DIG__ 21 2025-05-07T19:46:31.4941015Z #define _POSIX_HIWAT _POSIX_PIPE_BUF 2025-05-07T19:46:31.4941320Z #define __DEC128_MIN__ 1E-6143DL 2025-05-07T19:46:31.4941614Z #define __cudaCDP2EventRecordWithFlags 2025-05-07T19:46:31.4941942Z #define _GLIBCXX_ATOMIC_BUILTINS 1 2025-05-07T19:46:31.4942257Z #define cudaPeerAccessDefault 0x00 2025-05-07T19:46:31.4942547Z #define _GLIBCXX_HAVE_SYS_SOCKET_H 1 2025-05-07T19:46:31.4942843Z #define __REGISTER_PREFIX__ 2025-05-07T19:46:31.4943111Z #define __UINT16_MAX__ 0xffff 2025-05-07T19:46:31.4943461Z #define __glibcxx_requires_sorted_set(_First1,_Last1,_First2) 2025-05-07T19:46:31.4943835Z #define _IOS_NOREPLACE 64 2025-05-07T19:46:31.4944087Z #define __cdecl 2025-05-07T19:46:31.4944413Z #define cudaEventInterprocess 0x04 2025-05-07T19:46:31.4944823Z #define M_SQRT1_2l 0.707106781186547524400844362104849039L 2025-05-07T19:46:31.4945159Z #define LOGIN_NAME_MAX 256 2025-05-07T19:46:31.4945422Z #define _IO_TIED_PUT_GET 0x400 2025-05-07T19:46:31.4945699Z #define X_TLOSS 1.41484755040568800000e+16 2025-05-07T19:46:31.4946002Z #define CUDA_IPC_HANDLE_SIZE 64 2025-05-07T19:46:31.4946294Z #define __LDBL_HAS_INFINITY__ 1 2025-05-07T19:46:31.4946613Z #define __attribute_pure__ __attribute__ ((__pure__)) 2025-05-07T19:46:31.4946974Z #define __TEXTURE_TYPES_H__ 2025-05-07T19:46:31.4947405Z #define __NV_GLIBCXX_VERSION (__GNUC__ * 10000 + __GNUC_MINOR__ * 100 + __GNUC_PATCHLEVEL__) 2025-05-07T19:46:31.4947883Z #define ADJ_NANO 0x2000 2025-05-07T19:46:31.4948196Z #define __FLT32_MIN__ 1.17549435082228750796873653722224568e-38F32 2025-05-07T19:46:31.4948590Z #define __UINT8_TYPE__ unsigned char 2025-05-07T19:46:31.4948899Z #define _GLIBCXX_HAVE_ISWBLANK 1 2025-05-07T19:46:31.4949175Z #define __FLT_DIG__ 6 2025-05-07T19:46:31.4949559Z #define __REDIRECT_LDBL(name,proto,alias) __REDIRECT (name, proto, alias) 2025-05-07T19:46:31.4949987Z #define __NO_INLINE__ 1 2025-05-07T19:46:31.4950312Z #define _PSTL_EARLYEXIT_PRESENT (__INTEL_COMPILER >= 1800) 2025-05-07T19:46:31.4950684Z #define _POSIX_NGROUPS_MAX 8 2025-05-07T19:46:31.4950963Z #define ADJ_STATUS 0x0010 2025-05-07T19:46:31.4951229Z #define __cudaCDP2MemcpyAsync_ptsz 2025-05-07T19:46:31.4951540Z #define CLOCK_BOOTTIME_ALARM 9 2025-05-07T19:46:31.4951833Z #define LONG_LONG_MAX __LONG_LONG_MAX__ 2025-05-07T19:46:31.4952239Z #define _GLIBCXX_HAVE_OBSOLETE_ISNAN 1 2025-05-07T19:46:31.4952528Z #define __DEC_EVAL_METHOD__ 2 2025-05-07T19:46:31.4952900Z #define cudaStreamGraphFireAndForget (cudaStream_t)0x0200000000000000 2025-05-07T19:46:31.4953322Z #define _GLIBCXX_HAVE_ALIGNED_ALLOC 1 2025-05-07T19:46:31.4953658Z #define __DEC128_MAX__ 9.999999999999999999999999999999999E6144DL 2025-05-07T19:46:31.4954010Z #define CHAR_MIN SCHAR_MIN 2025-05-07T19:46:31.4954244Z #define MAX_CANON 255 2025-05-07T19:46:31.4954476Z #define __FLT_MANT_DIG__ 24 2025-05-07T19:46:31.4954725Z #define __LDBL_DECIMAL_DIG__ 21 2025-05-07T19:46:31.4955003Z #define _GLIBCXX_HAVE_COMPLEX_H 1 2025-05-07T19:46:31.4955294Z #define _PSTL_PRAGMA_VECTOR_UNALIGNED 2025-05-07T19:46:31.4955597Z #define _POSIX_FD_SETSIZE _POSIX_OPEN_MAX 2025-05-07T19:46:31.4955901Z #define _GLIBCXX_HAVE_HYPOT 1 2025-05-07T19:46:31.4956166Z #define __cudaCDP2Memset2DAsync_ptsz 2025-05-07T19:46:31.4956486Z #define _GLIBCXX_TR1_MODIFIED_BESSEL_FUNC_TCC 1 2025-05-07T19:46:31.4956789Z #define __VERSION__ "11.4.0" 2025-05-07T19:46:31.4957053Z #define _GLIBCXX11_USE_C99_STDLIB 1 2025-05-07T19:46:31.4957334Z #define cudaHostRegisterMapped 0x02 2025-05-07T19:46:31.4957628Z #define _GLIBCXX_HAVE_INT64_T 1 2025-05-07T19:46:31.4957912Z #define _GLIBCXX_USE_CONSTEXPR constexpr 2025-05-07T19:46:31.4958218Z #define FD_ZERO(fdsetp) __FD_ZERO (fdsetp) 2025-05-07T19:46:31.4958517Z #define __UINT64_C(c) c ## UL 2025-05-07T19:46:31.4958770Z #define MOD_OFFSET ADJ_OFFSET 2025-05-07T19:46:31.4959024Z #define _SYS_TYPES_H 1 2025-05-07T19:46:31.4959255Z #define AIO_PRIO_DELTA_MAX 20 2025-05-07T19:46:31.4959517Z #define _GLIBCXX_HAVE_TANHF 1 2025-05-07T19:46:31.4959758Z #define _SYS_CDEFS_H 1 2025-05-07T19:46:31.4959998Z #define _GLIBCXX_HAVE_TANHL 1 2025-05-07T19:46:31.4960264Z #define __cpp_unicode_characters 201411L 2025-05-07T19:46:31.4960566Z #define _IO_ERR_SEEN 0x20 2025-05-07T19:46:31.4960823Z #define _GLIBCXX_USE_DECIMAL_FLOAT 1 2025-05-07T19:46:31.4961113Z #define __cudaCDP2StreamDestroy 2025-05-07T19:46:31.4961389Z #define FP_SUBNORMAL 3 2025-05-07T19:46:31.4961630Z #define cudaOccupancyDefault 0x00 2025-05-07T19:46:31.4961911Z #define _INITIALIZER_LIST 2025-05-07T19:46:31.4962152Z #define _STDC_PREDEF_H 1 2025-05-07T19:46:31.4962415Z #define _GLIBCXX_PACKAGE_BUGREPORT "" 2025-05-07T19:46:31.4962701Z #define _GLIBCXX_HAVE_MODF 1 2025-05-07T19:46:31.4962964Z #define _IO_file_flags _flags 2025-05-07T19:46:31.4963279Z #define __USE_XOPEN2K8 1 2025-05-07T19:46:31.4963533Z #define htobe64(x) __bswap_64 (x) 2025-05-07T19:46:31.4963862Z #define _OLD_STDIO_MAGIC 0xFABC0000 2025-05-07T19:46:31.4964126Z #define HUGE 3.40282347e+38F 2025-05-07T19:46:31.4964387Z #define __cpp_lib_is_null_pointer 201309 2025-05-07T19:46:31.4964759Z #define WEXITSTATUS(status) __WEXITSTATUS (__WAIT_INT (status)) 2025-05-07T19:46:31.4965163Z #define islower_l(c,l) __islower_l ((c), (l)) 2025-05-07T19:46:31.4965463Z #define _GLIBCXX_USE_CXX11_ABI 1 2025-05-07T19:46:31.4965743Z #define _GLIBCXX_HAVE_SYMLINK 1 2025-05-07T19:46:31.4965987Z #define _BSD_SOURCE 1 2025-05-07T19:46:31.4966220Z #define _GLIBCXX_THROW(_EXC) 2025-05-07T19:46:31.4967095Z #define _GLIBCXX_HAS_NESTED_TYPE(_NTYPE) template> struct __has_ ##_NTYPE : false_type { }; template struct __has_ ##_NTYPE<_Tp, __void_t> : true_type { }; 2025-05-07T19:46:31.4967957Z #define __catch(X) catch(X) 2025-05-07T19:46:31.4968217Z #define __INT_LEAST32_MAX__ 0x7fffffff 2025-05-07T19:46:31.4968498Z #define LINE_MAX _POSIX2_LINE_MAX 2025-05-07T19:46:31.4968768Z #define __TIMER_T_TYPE void * 2025-05-07T19:46:31.4969002Z #define __STRING(x) #x 2025-05-07T19:46:31.4969242Z #define __GCC_ATOMIC_INT_LOCK_FREE 2 2025-05-07T19:46:31.4969501Z #define _T_PTRDIFF_ 2025-05-07T19:46:31.4969742Z #define _GLIBCXX_USE_NOEXCEPT noexcept 2025-05-07T19:46:31.4970052Z #define cudaEventWaitExternal 0x01 2025-05-07T19:46:31.4970313Z #define __unbounded 2025-05-07T19:46:31.4970559Z #define __DEVICE_ATOMIC_FUNCTIONS_H__ 2025-05-07T19:46:31.4970838Z #define __FLT128_MAX_EXP__ 16384 2025-05-07T19:46:31.4971117Z #define __INO_T_TYPE __SYSCALL_ULONG_TYPE 2025-05-07T19:46:31.4971404Z #define be16toh(x) __bswap_16 (x) 2025-05-07T19:46:31.4971682Z #define __cpp_lib_is_final 201402L 2025-05-07T19:46:31.4971966Z #define _GLIBCXX_BEGIN_NAMESPACE_CONTAINER 2025-05-07T19:46:31.4972296Z #define LONG_LONG_MIN (-LONG_LONG_MAX - 1LL) 2025-05-07T19:46:31.4972598Z #define __MATH_DECLARE_LDOUBLE 1 2025-05-07T19:46:31.4972879Z #define __managed__ __location__(managed) 2025-05-07T19:46:31.4973183Z #define _POSIX2_EXPR_NEST_MAX 32 2025-05-07T19:46:31.4973574Z #define __GNUC_PREREQ(maj,min) ((__GNUC__ << 16) + __GNUC_MINOR__ >= ((maj) << 16) + (min)) 2025-05-07T19:46:31.4974005Z #define _POSIX_STREAM_MAX 8 2025-05-07T19:46:31.4974256Z #define __LIBRARY_TYPES_H__ 2025-05-07T19:46:31.4974636Z #define _GLIBCXX_END_NAMESPACE_LDBL_OR_CXX11 _GLIBCXX_END_NAMESPACE_CXX11 2025-05-07T19:46:31.4975032Z #define __FLT32_MANT_DIG__ 24 2025-05-07T19:46:31.4975284Z #define _SYS_SIZE_T_H 2025-05-07T19:46:31.4975564Z #define _PSTL_VERSION_MINOR ((_PSTL_VERSION % 1000) / 10) 2025-05-07T19:46:31.4975907Z #define _GLIBCXX_STDLIB_H 1 2025-05-07T19:46:31.4976187Z #define isupper_l(c,l) __isupper_l ((c), (l)) 2025-05-07T19:46:31.4976471Z #define _CRTIMP 2025-05-07T19:46:31.4976697Z #define _GLIBCXX_CXX_CONFIG_H 1 2025-05-07T19:46:31.4976985Z #define __FLOAT_WORD_ORDER__ __ORDER_LITTLE_ENDIAN__ 2025-05-07T19:46:31.4977316Z #define STA_PPSJITTER 0x0200 2025-05-07T19:46:31.4977656Z #define _IO_feof_unlocked(__fp) (((__fp)->_flags & _IO_EOF_SEEN) != 0) 2025-05-07T19:46:31.4978077Z #define __SUSECONDS_T_TYPE __SYSCALL_SLONG_TYPE 2025-05-07T19:46:31.4978386Z #define _GLIBCXX_HAVE_ISINFF 1 2025-05-07T19:46:31.4978667Z #define __glibcxx_requires_subscript(_N) 2025-05-07T19:46:31.4978948Z #define __SIZE_T__ 2025-05-07T19:46:31.4979162Z #define __stub_gtty 2025-05-07T19:46:31.4979468Z #define __pid_t_defined 2025-05-07T19:46:31.4979712Z #define _GLIBCXX_FWDREF(_Tp) _Tp&& 2025-05-07T19:46:31.4980200Z #define __NLINK_T_TYPE __SYSCALL_ULONG_TYPE 2025-05-07T19:46:31.4980531Z #define __glibcxx_function_requires(...) 2025-05-07T19:46:31.4980852Z #define __SM_80_RT_HPP__ 2025-05-07T19:46:31.4981098Z #define __need_clockid_t 2025-05-07T19:46:31.4981356Z #define SSIZE_MAX LONG_MAX 2025-05-07T19:46:31.4981619Z #define _GLIBCXX_HAVE_USELOCALE 1 2025-05-07T19:46:31.4981963Z #define __glibcxx_requires_string_len(_String,_Len) 2025-05-07T19:46:31.4982378Z #define _IO_HEX 0100 2025-05-07T19:46:31.4982718Z #define __NFDBITS (8 * (int) sizeof (__fd_mask)) 2025-05-07T19:46:31.4983082Z #define cudaExternalMemoryDedicated 0x1 2025-05-07T19:46:31.4983181Z #define _GLIBCXX_HAVE_TGMATH_H 1 2025-05-07T19:46:31.4983285Z #define _GLIBCXX11_USE_C99_COMPLEX 1 2025-05-07T19:46:31.4983522Z #define _GLIBCXX17_DEPRECATED_SUGGEST(ALT) _GLIBCXX_DEPRECATED_SUGGEST(ALT) 2025-05-07T19:46:31.4983660Z #define ispunct_l(c,l) __ispunct_l ((c), (l)) 2025-05-07T19:46:31.4983766Z #define __cpp_aggregate_bases 201603L 2025-05-07T19:46:31.4983870Z #define __cudaGet_blockDim() blockDim 2025-05-07T19:46:31.4983991Z #define __cudaCDP2Memcpy3DAsync 2025-05-07T19:46:31.4984092Z #define __cudaCDP2MemcpyAsync 2025-05-07T19:46:31.4984176Z #define __stub_sstk 2025-05-07T19:46:31.4984271Z #define _IO_IN_BACKUP 0x100 2025-05-07T19:46:31.4984447Z #define _GLIBCXX_USE_C99_STDLIB _GLIBCXX11_USE_C99_STDLIB 2025-05-07T19:46:31.4984531Z #define __wur 2025-05-07T19:46:31.4984655Z #define isprint_l(c,l) __isprint_l ((c), (l)) 2025-05-07T19:46:31.4984758Z #define _G_HAVE_MMAP 1 2025-05-07T19:46:31.4984845Z #define _IO_OCT 040 2025-05-07T19:46:31.4984941Z #define __FLT128_HAS_DENORM__ 1 2025-05-07T19:46:31.4985046Z #define NL_MSGMAX INT_MAX 2025-05-07T19:46:31.4985139Z #define _GLIBCXX_USE_LFS 1 2025-05-07T19:46:31.4985270Z #define cudaDeviceScheduleBlockingSync 0x04 2025-05-07T19:46:31.4985364Z #define _POSIX_RTSIG_MAX 8 2025-05-07T19:46:31.4985485Z #define _GLIBCXX_NOEXCEPT noexcept 2025-05-07T19:46:31.4985683Z #define __glibcxx_requires_partitioned_lower(_First,_Last,_Value) 2025-05-07T19:46:31.4985777Z #define __FLT32_DECIMAL_DIG__ 9 2025-05-07T19:46:31.4985882Z #define _STL_ALGOBASE_H 1 2025-05-07T19:46:31.4985992Z #define __cudaCDP2MemsetAsync_ptsz 2025-05-07T19:46:31.4986082Z #define __off64_t_defined 2025-05-07T19:46:31.4986180Z #define _GLIBCXX_WEAK_DEFINITION 2025-05-07T19:46:31.4986283Z #define __FLT128_DIG__ 33 2025-05-07T19:46:31.4986388Z #define _GLIBCXX_USE_C99_INTTYPES_TR1 1 2025-05-07T19:46:31.4986490Z #define _GLIBCXX_HAVE_LOCALE_H 1 2025-05-07T19:46:31.4986595Z #define __INT32_C(c) c 2025-05-07T19:46:31.4986695Z #define __DEC64_EPSILON__ 1E-15DD 2025-05-07T19:46:31.4986792Z #define __ORDER_PDP_ENDIAN__ 3412 2025-05-07T19:46:31.4986887Z #define __DEC128_MIN_EXP__ (-6142) 2025-05-07T19:46:31.4986994Z #define __PDP_ENDIAN 3412 2025-05-07T19:46:31.4987084Z #define _ISOC95_SOURCE 1 2025-05-07T19:46:31.4987180Z #define _IO_fpos64_t _G_fpos64_t 2025-05-07T19:46:31.4987343Z #define M_PI_2l 1.570796326794896619231321691639751442L 2025-05-07T19:46:31.4987442Z #define BYTE_ORDER __BYTE_ORDER 2025-05-07T19:46:31.4987533Z #define __SM_90_RT_HPP__ 2025-05-07T19:46:31.4987636Z #define __INT_FAST32_TYPE__ long int 2025-05-07T19:46:31.4987755Z #define __have_pthread_attr_t 1 2025-05-07T19:46:31.4987859Z #define _GLIBCXX_HAVE_LIMIT_DATA 1 2025-05-07T19:46:31.4988103Z #define _GLIBCXX_BEGIN_NAMESPACE_LDBL_OR_CXX11 _GLIBCXX_BEGIN_NAMESPACE_CXX11 2025-05-07T19:46:31.4988235Z #define __cudaCDP2StreamWaitEvent 2025-05-07T19:46:31.4988337Z #define __cudaCDP2EventRecord 2025-05-07T19:46:31.4988437Z #define _BITS_TYPESIZES_H 1 2025-05-07T19:46:31.4988529Z #define htole32(x) (x) 2025-05-07T19:46:31.4988803Z #define __cudaCDP2OccupancyMaxActiveBlocksPerMultiprocessorWithFlags 2025-05-07T19:46:31.4988931Z #define __SYSCALL_SLONG_TYPE __SLONGWORD_TYPE 2025-05-07T19:46:31.4989037Z #define _GLIBCXX_USE_C99_MATH_TR1 1 2025-05-07T19:46:31.4989223Z #define WSTOPSIG(status) __WSTOPSIG (__WAIT_INT (status)) 2025-05-07T19:46:31.4989370Z #define _GLIBCXX_USE_C99_MATH _GLIBCXX11_USE_C99_MATH 2025-05-07T19:46:31.4989506Z #define __UINT_LEAST16_TYPE__ short unsigned int 2025-05-07T19:46:31.4989670Z #define __WIFEXITED(status) (__WTERMSIG(status) == 0) 2025-05-07T19:46:31.4989769Z #define ADJ_OFFSET 0x0001 2025-05-07T19:46:31.4989875Z #define cudaArrayLayered 0x01 2025-05-07T19:46:31.4990054Z #define _PSTL_ICC_18_OMP_SIMD_BROKEN (__INTEL_COMPILER == 1800) 2025-05-07T19:46:31.4990189Z #define cudaEventRecordDefault 0x00 2025-05-07T19:46:31.4990345Z #define _GLIBCXX_HAVE_FMODF 1 2025-05-07T19:46:31.4990501Z #define _PSTL_PRAGMA_MESSAGE(x) 2025-05-07T19:46:31.4990605Z #define unix 1 2025-05-07T19:46:31.4990705Z #define __DBL_HAS_DENORM__ 1 2025-05-07T19:46:31.4990805Z #define _POSIX_CHILD_MAX 25 2025-05-07T19:46:31.4990904Z #define _POSIX_MAX_INPUT 255 2025-05-07T19:46:31.4991043Z #define __cudaCDP2DeviceGetCacheConfig 2025-05-07T19:46:31.4991136Z #define __USE_POSIX 1 2025-05-07T19:46:31.4991237Z #define __FD_ZERO_STOS "stosq" 2025-05-07T19:46:31.4991398Z #define _PSTL_VERSION_MAJOR (_PSTL_VERSION / 1000) 2025-05-07T19:46:31.4991497Z #define __THROWNL throw () 2025-05-07T19:46:31.4991594Z #define __cpp_rtti 199711L 2025-05-07T19:46:31.4991706Z #define __SIZE_TYPE__ long unsigned int 2025-05-07T19:46:31.4991817Z #define __PMT(args) args 2025-05-07T19:46:31.4992050Z #define __UINT64_MAX__ 0xffffffffffffffffUL 2025-05-07T19:46:31.4992193Z #define __va_arg_pack_len() __builtin_va_arg_pack_len () 2025-05-07T19:46:31.4992326Z #define __ULONGWORD_TYPE unsigned long int 2025-05-07T19:46:31.4992416Z #define _SIZE_T_DECLARED 2025-05-07T19:46:31.4992516Z #define _PSTL_STRING_AUX(x) #x 2025-05-07T19:46:31.4992609Z #define __FLT_IS_IEC_60559__ 2 2025-05-07T19:46:31.4993032Z #define _PSTL_CPP14_MAKE_REVERSE_ITERATOR_PRESENT (_MSC_VER >= 1900 || __cplusplus >= 201402L || __cpp_lib_make_reverse_iterator == 201402) 2025-05-07T19:46:31.4993132Z #define _GLIBCXX_HAVE_LIMIT_AS 1 2025-05-07T19:46:31.4993228Z #define XATTR_LIST_MAX 65536 2025-05-07T19:46:31.4993337Z #define __CUDACC_VER_MAJOR__ 12 2025-05-07T19:46:31.4993475Z #define __GNUC_WIDE_EXECUTION_CHARSET_NAME "UTF-32LE" 2025-05-07T19:46:31.4993561Z #define _WCHAR_T_H 2025-05-07T19:46:31.4993664Z #define __FLT64X_DIG__ 18 2025-05-07T19:46:31.4993751Z #define _IO_SHOWBASE 0200 2025-05-07T19:46:31.4993836Z #define _POSIX_QLIMIT 1 2025-05-07T19:46:31.4993928Z #define __INT8_TYPE__ signed char 2025-05-07T19:46:31.4994035Z #define __SURFACE_TYPES_H__ 2025-05-07T19:46:31.4994125Z #define __CUDA_ARCH__ 520 2025-05-07T19:46:31.4994229Z #define __cpp_digit_separators 201309L 2025-05-07T19:46:31.4994327Z #define __ELF__ 1 2025-05-07T19:46:31.4994423Z #define CLOCK_THREAD_CPUTIME_ID 3 2025-05-07T19:46:31.4994518Z #define __GCC_ASM_FLAG_OUTPUTS__ 1 2025-05-07T19:46:31.4994600Z #define STA_INS 0x0010 2025-05-07T19:46:31.4994708Z #define __UINT32_TYPE__ unsigned int 2025-05-07T19:46:31.4994874Z #define _toupper(c) ((int) (*__ctype_toupper_loc ())[(int) (c)]) 2025-05-07T19:46:31.4994964Z #define _BITS_BYTESWAP_H 1 2025-05-07T19:46:31.4995071Z #define __ID_T_TYPE __U32_TYPE 2025-05-07T19:46:31.4995178Z #define __TIME_T_TYPE __SYSCALL_SLONG_TYPE 2025-05-07T19:46:31.4995285Z #define __DEVICE_DOUBLE_FUNCTIONS_HPP__ 2025-05-07T19:46:31.4995380Z #define _GLIBCXX_HAVE_MBSTATE_T 1 2025-05-07T19:46:31.4995498Z #define __cpp_lib_logical_traits 201510 2025-05-07T19:46:31.4995590Z #define ADJ_OFFSET_SS_READ 0xa001 2025-05-07T19:46:31.4995743Z #define __warnattr(msg) __attribute__((__warning__ (msg))) 2025-05-07T19:46:31.4995912Z #define _PSTL_PRAGMA_LOCATION " [Parallel STL message]: " 2025-05-07T19:46:31.4996011Z #define _IO_funlockfile(_fp) 2025-05-07T19:46:31.4996334Z #define cudaKernelNodeAttributeAccessPolicyWindow cudaLaunchAttributeAccessPolicyWindow 2025-05-07T19:46:31.4996459Z #define M_2_PIl 0.636619772367581343075535053490057448L 2025-05-07T19:46:31.4996564Z #define __DRIVER_TYPES_H__ 2025-05-07T19:46:31.4996650Z #define __FLT_RADIX__ 2 2025-05-07T19:46:31.4996748Z #define __INT_LEAST16_TYPE__ short int 2025-05-07T19:46:31.4996928Z #define __LDBL_EPSILON__ 1.08420217248550443400745280086994171e-19L 2025-05-07T19:46:31.4997021Z #define __UINTMAX_C(c) c ## UL 2025-05-07T19:46:31.4997116Z #define _GLIBCXX_USE_LSTAT 1 2025-05-07T19:46:31.4997234Z #define minor(dev) gnu_dev_minor (dev) 2025-05-07T19:46:31.4997328Z #define _POSIX_C_SOURCE 200809L 2025-05-07T19:46:31.4997427Z #define _GLIBCXX_HAVE_DIRENT_H 1 2025-05-07T19:46:31.4997527Z #define __GLIBCXX_BITSIZE_INT_N_0 128 2025-05-07T19:46:31.4997695Z #define WORD_BIT 32 2025-05-07T19:46:31.4997779Z #define _IO_USER_BUF 1 2025-05-07T19:46:31.4997919Z #define __VECTOR_TYPES_H__ 2025-05-07T19:46:31.4998037Z #define __SM_20_ATOMIC_FUNCTIONS_HPP__ 2025-05-07T19:46:31.4998141Z #define cudaHostAllocPortable 0x01 2025-05-07T19:46:31.4998237Z #define PTHREAD_STACK_MIN 16384 2025-05-07T19:46:31.4998333Z #define __long_double_t long double 2025-05-07T19:46:31.4998441Z #define _GLIBCXX_HAVE_ISINF 1 2025-05-07T19:46:31.4998530Z #define _POSIX_ARG_MAX 4096 2025-05-07T19:46:31.4999097Z #define cudaKernelNodeAttributeDeviceUpdatableKernelNode cudaLaunchAttributeDeviceUpdatableKernelNode 2025-05-07T19:46:31.4999196Z #define __k8 1 2025-05-07T19:46:31.4999401Z #define _GLIBCXX_NO_OBSOLETE_ISINF_ISNAN_DYNAMIC __GLIBC_PREREQ(2,23) 2025-05-07T19:46:31.4999575Z #define __FLT32X_MIN__ 2.22507385850720138309023271733240406e-308F32x 2025-05-07T19:46:31.4999696Z #define __LDBL_REDIR(name,proto) name proto 2025-05-07T19:46:31.4999813Z #define __SIG_ATOMIC_MAX__ 0x7fffffff 2025-05-07T19:46:31.4999917Z #define __SM_30_INTRINSICS_HPP__ 2025-05-07T19:46:31.5000025Z #define _GLIBCXX_EXTERN_TEMPLATE 1 2025-05-07T19:46:31.5000135Z #define __blksize_t_defined 2025-05-07T19:46:31.5000372Z #define _IO_SHOWPOINT 0400 2025-05-07T19:46:31.5000480Z #define _GLIBCXX_HAVE_LIMIT_RSS 1 2025-05-07T19:46:31.5000599Z #define cudaDeviceLmemResizeToMax 0x10 2025-05-07T19:46:31.5000882Z #define _GLIBCXX_X86_RDRAND 1 2025-05-07T19:46:31.5000995Z #define __GCC_ATOMIC_WCHAR_T_LOCK_FREE 2 2025-05-07T19:46:31.5001095Z #define _IO_IS_FILEBUF 0x2000 2025-05-07T19:46:31.5001214Z #define _GLIBCXX_USE_DUAL_ABI 1 2025-05-07T19:46:31.5001492Z #define __bswap_constant_16(x) ((unsigned short int) ((((x) >> 8) & 0xff) | (((x) & 0xff) << 8))) 2025-05-07T19:46:31.5001865Z #define cudaSignalExternalSemaphoresAsync __CUDART_API_PTSZ(cudaSignalExternalSemaphoresAsync_v2) 2025-05-07T19:46:31.5001988Z #define UCHAR_MAX (SCHAR_MAX * 2 + 1) 2025-05-07T19:46:31.5002088Z #define __SIZEOF_PTRDIFF_T__ 8 2025-05-07T19:46:31.5002179Z #define SEEK_SET 0 2025-05-07T19:46:31.5002279Z #define _GLIBCXX_TR1_GAMMA_TCC 1 2025-05-07T19:46:31.5002397Z #define __CUDA_API_VER_MINOR__ 8 2025-05-07T19:46:31.5002603Z #define _GLIBCXX_VISIBILITY(V) __attribute__ ((__visibility__ (#V))) 2025-05-07T19:46:31.5002709Z #define __cudaCDP2GetLastError 2025-05-07T19:46:31.5002823Z #define _GLIBCXX_HAVE_COSL 1 2025-05-07T19:46:31.5002916Z #define _MATH_H_MATHDEF 1 2025-05-07T19:46:31.5003273Z #define __bswap_constant_32(x) ((((x) & 0xff000000) >> 24) | (((x) & 0x00ff0000) >> 8) | (((x) & 0x0000ff00) << 8) | (((x) & 0x000000ff) << 24)) 2025-05-07T19:46:31.5003376Z #define _GLIBCXX_USE_FLOAT128 1 2025-05-07T19:46:31.5003489Z #define _IO_FLAGS2_NOTCANCEL 2 2025-05-07T19:46:31.5003583Z #define __stub_sigreturn 2025-05-07T19:46:31.5003844Z #define __errordecl(name,msg) extern void name (void) __attribute__((__error__ (msg))) 2025-05-07T19:46:31.5003957Z #define _GLIBCXX_HAVE_UTIME_H 1 2025-05-07T19:46:31.5004052Z #define __HOST_CONFIG_H__ 2025-05-07T19:46:31.5004157Z #define _XOPEN_SOURCE_EXTENDED 1 2025-05-07T19:46:31.5004260Z #define CLOCK_TAI 11 2025-05-07T19:46:31.5004374Z #define _GLIBCXX_END_NAMESPACE_VERSION 2025-05-07T19:46:31.5004599Z #define __glibcxx_requires_sorted_set_pred(_First1,_Last1,_First2,_Pred) 2025-05-07T19:46:31.5004691Z #define __restrict_arr 2025-05-07T19:46:31.5004823Z #define _PSTL_PRAGMA_MESSAGE_POLICIES(x) 2025-05-07T19:46:31.5004973Z #define __glibcxx_requires_valid_range(_First,_Last) 2025-05-07T19:46:31.5005557Z #define strndupa(s,n) (__extension__ ({ const char *__old = (s); size_t __len = strnlen (__old, (n)); char *__new = (char *) __builtin_alloca (__len + 1); __new[__len] = '\0'; (char *) memcpy (__new, __old, __len); })) 2025-05-07T19:46:31.5005763Z #define __attribute_artificial__ __attribute__ ((__artificial__)) 2025-05-07T19:46:31.5005852Z #define __USE_MISC 1 2025-05-07T19:46:31.5005962Z #define __UWORD_TYPE unsigned long int 2025-05-07T19:46:31.5006081Z #define _EXCEPTION_DEFINES_H 1 2025-05-07T19:46:31.5006286Z #define _GCC_LIMITS_H_ 2025-05-07T19:46:31.5006375Z #define __LDBL_DIG__ 18 2025-05-07T19:46:31.5006997Z #define __BIT_TYPES_DEFINED__ 1 2025-05-07T19:46:31.5007122Z #define __malloc_and_calloc_defined 2025-05-07T19:46:31.5007218Z #define __FLT64_IS_IEC_60559__ 2 2025-05-07T19:46:31.5007326Z #define _GLIBCXX_HAVE_SYS_SYSINFO_H 1 2025-05-07T19:46:31.5007428Z #define __x86_64__ 1 2025-05-07T19:46:31.5007516Z #define _SIZE_T_ 2025-05-07T19:46:31.5008529Z #define __bswap_constant_64(x) (__extension__ ((((x) & 0xff00000000000000ull) >> 56) | (((x) & 0x00ff000000000000ull) >> 40) | (((x) & 0x0000ff0000000000ull) >> 24) | (((x) & 0x000000ff00000000ull) >> 8) | (((x) & 0x00000000ff000000ull) << 8) | (((x) & 0x0000000000ff0000ull) << 24) | (((x) & 0x000000000000ff00ull) << 40) | (((x) & 0x00000000000000ffull) << 56))) 2025-05-07T19:46:31.5008651Z #define _POSIX2_COLL_WEIGHTS_MAX 2 2025-05-07T19:46:31.5008751Z #define __FLT32X_MIN_EXP__ (-1021) 2025-05-07T19:46:31.5008870Z #define __PTHREAD_RWLOCK_INT_FLAGS_SHARED 1 2025-05-07T19:46:31.5008996Z #define __DEC32_SUBNORMAL_MIN__ 0.000001E-95DF 2025-05-07T19:46:31.5009117Z #define _IO_iconv_t _G_iconv_t 2025-05-07T19:46:31.5009231Z #define _GLIBCXX_FLOAT_IS_IEEE_BINARY32 1 2025-05-07T19:46:31.5009361Z #define __cpp_lib_make_reverse_iterator 201402 2025-05-07T19:46:31.5009522Z #define _GLIBCXX_SYNCHRONIZATION_HAPPENS_BEFORE(A) 2025-05-07T19:46:31.5009621Z #define _GLIBCXX_HAVE_DLFCN_H 1 2025-05-07T19:46:31.5010140Z #define strdupa(s) (__extension__ ({ const char *__old = (s); size_t __len = strlen (__old) + 1; char *__new = (char *) __builtin_alloca (__len); (char *) memcpy (__new, __old, __len); })) 2025-05-07T19:46:31.5010285Z #define __no_return__ __attribute__((noreturn)) 2025-05-07T19:46:31.5010438Z #define __device_builtin__ __location__(device_builtin) 2025-05-07T19:46:31.5010543Z #define _PSTL_HIDE_FROM_ABI_POP 2025-05-07T19:46:31.5010642Z #define _GLIBCXX_HAVE_ACOSF 1 2025-05-07T19:46:31.5010746Z #define STA_FLL 0x0008 2025-05-07T19:46:31.5010898Z #define _GLIBCXX_HAVE_BUILTIN_IS_CONSTANT_EVALUATED 1 2025-05-07T19:46:31.5011002Z #define _GLIBCXX_END_EXTERN_C } 2025-05-07T19:46:31.5011147Z #define __INT_FAST16_MAX__ 0x7fffffffffffffffL 2025-05-07T19:46:31.5011262Z #define __cpp_lib_integer_sequence 201304 2025-05-07T19:46:31.5011350Z #define __stub_revoke 2025-05-07T19:46:31.5011446Z #define __timer_t_defined 1 2025-05-07T19:46:31.5011600Z #define _GLIBCXX11_DEPRECATED _GLIBCXX_DEPRECATED 2025-05-07T19:46:31.5011694Z #define INT_MAX __INT_MAX__ 2025-05-07T19:46:31.5011805Z #define ULLONG_MAX (LLONG_MAX * 2ULL + 1) 2025-05-07T19:46:31.5011929Z #define _GLIBCXX_END_NAMESPACE_CXX11 } 2025-05-07T19:46:31.5012031Z #define _GLIBCXX_ICONV_CONST 2025-05-07T19:46:31.5012137Z #define major(dev) gnu_dev_major (dev) 2025-05-07T19:46:31.5012251Z #define cudaArrayTextureGather 0x08 2025-05-07T19:46:31.5012374Z #define _GLIBCXX_LT_OBJDIR ".libs/" 2025-05-07T19:46:31.5012527Z #define __inline_hint__ __attribute__((nv_inline_hint)) 2025-05-07T19:46:31.5012628Z #define __NV_LEGACY_LAUNCH 1 2025-05-07T19:46:31.5012742Z #define _IO_off_t __off_t 2025-05-07T19:46:31.5012834Z #define __FLT64_DIG__ 15 2025-05-07T19:46:31.5013075Z #define PTHREAD_DESTRUCTOR_ITERATIONS _POSIX_THREAD_DESTRUCTOR_ITERATIONS 2025-05-07T19:46:31.5013174Z #define _POSIX2_LINE_MAX 2048 2025-05-07T19:46:31.5013429Z #define __UINT_FAST32_MAX__ 0xffffffffffffffffUL 2025-05-07T19:46:31.5013553Z #define __UINT_LEAST64_TYPE__ long unsigned int 2025-05-07T19:46:31.5013650Z #define ADJ_FREQUENCY 0x0002 2025-05-07T19:46:31.5013768Z #define __CUDART_API_PTDS(api) api 2025-05-07T19:46:31.5013854Z #define NULL __null 2025-05-07T19:46:31.5013986Z #define cudaStreamPerThread ((cudaStream_t)0x2) 2025-05-07T19:46:31.5014108Z #define _GLIBCXX_CONSTEXPR constexpr 2025-05-07T19:46:31.5014207Z #define __U64_TYPE unsigned long int 2025-05-07T19:46:31.5014302Z #define __FLT_HAS_QUIET_NAN__ 1 2025-05-07T19:46:31.5014396Z #define __FLT_MAX_10_EXP__ 38 2025-05-07T19:46:31.5014495Z #define FP_ZERO 2 2025-05-07T19:46:31.5014592Z #define _GLIBCXX_HAVE_FLOORL 1 2025-05-07T19:46:31.5014807Z #define __isgraph_l(c,l) __isctype_l((c), _ISgraph, (l)) 2025-05-07T19:46:31.5014980Z #define __LONG_MAX__ 0x7fffffffffffffffL 2025-05-07T19:46:31.5015066Z #define __WCHAR_T__ 2025-05-07T19:46:31.5015164Z #define __FLT64X_HAS_DENORM__ 1 2025-05-07T19:46:31.5015370Z #define __DEC128_SUBNORMAL_MIN__ 0.000000000000000000000000000000001E-6143DL 2025-05-07T19:46:31.5015542Z #define _GLIBCXX_NORETURN __attribute__ ((__noreturn__)) 2025-05-07T19:46:31.5015641Z #define __FLT_HAS_INFINITY__ 1 2025-05-07T19:46:31.5015765Z #define __GNUC_EXECUTION_CHARSET_NAME "UTF-8" 2025-05-07T19:46:31.5015902Z #define _GLIBCXX20_DEPRECATED_SUGGEST(ALT) 2025-05-07T19:46:31.5016034Z #define __WSTOPSIG(status) __WEXITSTATUS(status) 2025-05-07T19:46:31.5016163Z #define cudaSurfaceTypeCubemapLayered 0xFC 2025-05-07T19:46:31.5016258Z #define _BSD_PTRDIFF_T_ 2025-05-07T19:46:31.5016369Z #define _SIGSET_H_types 1 2025-05-07T19:46:31.5016486Z #define cudaTextureType1DLayered 0xF1 2025-05-07T19:46:31.5016598Z #define __cpp_unicode_literals 200710L 2025-05-07T19:46:31.5016769Z #define __isdigit_l(c,l) __isctype_l((c), _ISdigit, (l)) 2025-05-07T19:46:31.5016876Z #define __LONG_LONG_PAIR(HI,LO) LO, HI 2025-05-07T19:46:31.5016999Z #define __UINT_FAST16_TYPE__ long unsigned int 2025-05-07T19:46:31.5017169Z #define __bos0(ptr) __builtin_object_size (ptr, 0) 2025-05-07T19:46:31.5017283Z #define __DEC64_MAX__ 9.999999999999999E384DD 2025-05-07T19:46:31.5017428Z #define M_1_PIl 0.318309886183790671537767526745028724L 2025-05-07T19:46:31.5017543Z #define __CUDACC_DEVICE_ATOMIC_BUILTINS__ 1 2025-05-07T19:46:31.5017768Z #define WIFSTOPPED(status) __WIFSTOPPED (__WAIT_INT (status)) 2025-05-07T19:46:31.5017877Z #define __INT_FAST32_WIDTH__ 64 2025-05-07T19:46:31.5017995Z #define _POSIX2_CHARCLASS_NAME_MAX 14 2025-05-07T19:46:31.5018139Z #define _GLIBCXX_BITS_STD_ABS_H 2025-05-07T19:46:31.5018238Z #define STA_MODE 0x4000 2025-05-07T19:46:31.5018371Z #define __CHAR16_TYPE__ short unsigned int 2025-05-07T19:46:31.5018497Z #define __PRAGMA_REDEFINE_EXTNAME 1 2025-05-07T19:46:31.5018654Z #define __glibcxx_signed_b(T,B) ((T)(-1) < 0) 2025-05-07T19:46:31.5018772Z #define __USING_NAMESPACE_C99(name) 2025-05-07T19:46:31.5018892Z #define BIG_ENDIAN __BIG_ENDIAN 2025-05-07T19:46:31.5019039Z #define __cudaCDP2EventRecord_ptsz 2025-05-07T19:46:31.5019149Z #define _GLIBCXX_HAVE_SINL 1 2025-05-07T19:46:31.5019284Z #define EXPR_NEST_MAX _POSIX2_EXPR_NEST_MAX 2025-05-07T19:46:31.5019458Z #define __SIZE_WIDTH__ 64 2025-05-07T19:46:31.5019626Z #define __BLKSIZE_T_TYPE __SYSCALL_SLONG_TYPE 2025-05-07T19:46:31.5019893Z #define __SEG_FS 1 2025-05-07T19:46:31.5020003Z #define _IO_size_t size_t 2025-05-07T19:46:31.5020153Z #define __INT_LEAST16_MAX__ 0x7fff 2025-05-07T19:46:31.5020274Z #define INT_MIN (-INT_MAX - 1) 2025-05-07T19:46:31.5020382Z #define __stub_lchmod 2025-05-07T19:46:31.5020509Z #define __DEC64_MANT_DIG__ 16 2025-05-07T19:46:31.5020640Z #define __INT64_MAX__ 0x7fffffffffffffffL 2025-05-07T19:46:31.5020743Z #define _GLIBCXX_MANGLE_SIZE_T m 2025-05-07T19:46:31.5020836Z #define __SEG_GS 1 2025-05-07T19:46:31.5021052Z #define __FLT32_DENORM_MIN__ 1.40129846432481707092372958328991613e-45F32 2025-05-07T19:46:31.5021147Z #define _IOS_APPEND 8 2025-05-07T19:46:31.5021248Z #define __SIG_ATOMIC_WIDTH__ 32 2025-05-07T19:46:31.5021345Z #define _GLIBCXX_RELEASE 11 2025-05-07T19:46:31.5021467Z #define _GLIBCXX98_USE_C99_WCHAR 1 2025-05-07T19:46:31.5021572Z #define _IO_IS_APPENDING 0x1000 2025-05-07T19:46:31.5021676Z #define __INT_LEAST64_TYPE__ long int 2025-05-07T19:46:31.5021783Z #define htole16(x) (x) 2025-05-07T19:46:31.5021897Z #define __TEXTURE_INDIRECT_FUNCTIONS_H__ 2025-05-07T19:46:31.5021995Z #define _GLIBCXX_HAVE_FCNTL_H 1 2025-05-07T19:46:31.5022097Z #define __INT16_TYPE__ short int 2025-05-07T19:46:31.5022218Z #define __INT_LEAST8_TYPE__ signed char 2025-05-07T19:46:31.5022327Z #define __glibcxx_class_requires(_a,_b) 2025-05-07T19:46:31.5022442Z #define __cpp_structured_bindings 201606L 2025-05-07T19:46:31.5022641Z #define __align__(n) __attribute__((aligned(n))) 2025-05-07T19:46:31.5022733Z #define __SIZEOF_INT__ 4 2025-05-07T19:46:31.5022893Z #define __WCLONE 0x80000000 2025-05-07T19:46:31.5023003Z #define __DEC32_MAX_EXP__ 97 2025-05-07T19:46:31.5023090Z #define SEEK_HOLE 4 2025-05-07T19:46:31.5023182Z #define TIMER_ABSTIME 1 2025-05-07T19:46:31.5023278Z #define __INT_FAST8_MAX__ 0x7f 2025-05-07T19:46:31.5023385Z #define __CUDA_MATH_CRTIMP 2025-05-07T19:46:31.5023569Z #define __FLT128_MAX__ 1.18973149535723176508575932662800702e+4932F128 2025-05-07T19:46:31.5023693Z #define __INTPTR_MAX__ 0x7fffffffffffffffL 2025-05-07T19:46:31.5023829Z #define __DRIVER_FUNCTIONS_H__ 2025-05-07T19:46:31.5023942Z #define __cpp_sized_deallocation 201309L 2025-05-07T19:46:31.5024054Z #define __MATH_FUNCTIONS_HPP__ 2025-05-07T19:46:31.5024192Z #define __cpp_guaranteed_copy_elision 201606L 2025-05-07T19:46:31.5024333Z #define _LINUX_LIMITS_H 2025-05-07T19:46:31.5024431Z #define linux 1 2025-05-07T19:46:31.5024539Z #define MOD_MICRO ADJ_MICRO 2025-05-07T19:46:31.5024706Z #define _GLIBCXX_DEBUG_ASSERT(_Condition) 2025-05-07T19:46:31.5024826Z #define _GLIBCXX_HAVE_VSWSCANF 1 2025-05-07T19:46:31.5024938Z #define _GLIBCXX_HAVE_ISNAN 1 2025-05-07T19:46:31.5025064Z #define _XOPEN_IOV_MAX _POSIX_UIO_MAXIOV 2025-05-07T19:46:31.5025263Z #define __cudart_builtin__ __location__(cudart_builtin) 2025-05-07T19:46:31.5025379Z #define __cpp_lib_hypot 201603 2025-05-07T19:46:31.5025492Z #define __FLT64_HAS_QUIET_NAN__ 1 2025-05-07T19:46:31.5025637Z #define _GLIBCXX_HAVE_WCTYPE_H 1 2025-05-07T19:46:31.5025747Z #define MOD_NANO ADJ_NANO 2025-05-07T19:46:31.5025848Z #define htole64(x) (x) 2025-05-07T19:46:31.5041847Z #define FP_ILOGBNAN (-2147483647 - 1) 2025-05-07T19:46:31.5042048Z #define _IO_stdout ((_IO_FILE*)(&_IO_2_1_stdout_)) 2025-05-07T19:46:31.5042148Z #define _IO_UPPERCASE 01000 2025-05-07T19:46:31.5042646Z #define cudaKernelNodeAttributeClusterSchedulingPolicyPreference cudaLaunchAttributeClusterSchedulingPolicyPreference 2025-05-07T19:46:31.5042758Z #define __USE_POSIX2 1 2025-05-07T19:46:31.5042856Z #define MOD_ESTERROR ADJ_ESTERROR 2025-05-07T19:46:31.5042945Z #define __WALL 0x40000000 2025-05-07T19:46:31.5043040Z #define _GLIBCXX_HAVE_LDEXPF 1 2025-05-07T19:46:31.5043135Z #define _XLOCALE_H 1 2025-05-07T19:46:31.5043226Z #define _GLIBCXX_USE_TMPNAM 1 2025-05-07T19:46:31.5043320Z #define __FLT32_MIN_10_EXP__ (-37) 2025-05-07T19:46:31.5043423Z #define __KEY_T_TYPE __S32_TYPE 2025-05-07T19:46:31.5043522Z #define __cudaGet_threadIdx() threadIdx 2025-05-07T19:46:31.5043608Z #define __EXCEPTIONS 1 2025-05-07T19:46:31.5043704Z #define __CUDART_API_PTSZ(api) api 2025-05-07T19:46:31.5043909Z #define __launch_bounds__(...) __annotate__(launch_bounds(__VA_ARGS__)) 2025-05-07T19:46:31.5043990Z #define __WORDSIZE 64 2025-05-07T19:46:31.5044077Z #define CLOCK_MONOTONIC 1 2025-05-07T19:46:31.5044173Z #define _STL_RELOPS_H 1 2025-05-07T19:46:31.5044263Z #define __PTRDIFF_WIDTH__ 64 2025-05-07T19:46:31.5044357Z #define __BEGIN_DECLS extern "C" { 2025-05-07T19:46:31.5044456Z #define _GLIBCXX_HAVE_SYS_IPC_H 1 2025-05-07T19:46:31.5044551Z #define __LDBL_MANT_DIG__ 64 2025-05-07T19:46:31.5044648Z #define _GLIBCXX_HAVE_TRUNCATE 1 2025-05-07T19:46:31.5044950Z #define cudaKernelNodeAttributeClusterDimension cudaLaunchAttributeClusterDimension 2025-05-07T19:46:31.5045193Z #define _PSTL_GCC_VERSION (__GNUC__ * 10000 + __GNUC_MINOR__ * 100 + __GNUC_PATCHLEVEL__) 2025-05-07T19:46:31.5045310Z #define _GLIBCXX_NAMESPACE_CXX11 __cxx11:: 2025-05-07T19:46:31.5045403Z #define _GLIBCXX_NUMERIC_LIMITS 1 2025-05-07T19:46:31.5045510Z #define __cpp_range_based_for 201603L 2025-05-07T19:46:31.5045615Z #define __cpp_lib_exchange_function 201304 2025-05-07T19:46:31.5045715Z #define _GLIBCXX_HAVE_INTTYPES_H 1 2025-05-07T19:46:31.5045820Z #define _GLIBCXX_DARWIN_USE_64_BIT_INODE 1 2025-05-07T19:46:31.5046008Z #define cudaCooperativeLaunchMultiDeviceNoPostSync 0x02 2025-05-07T19:46:31.5046101Z #define __FLT64_HAS_INFINITY__ 1 2025-05-07T19:46:31.5046187Z #define _GLIBCXX_CSTDLIB 1 2025-05-07T19:46:31.5046432Z #define _GLIBCXX_DEBUG_MACRO_SWITCH_H 1 2025-05-07T19:46:31.5046658Z #define __FLT64X_MAX__ 1.18973149535723176502126385303097021e+4932F64x 2025-05-07T19:46:31.5046772Z #define __STDCPP_DEFAULT_NEW_ALIGNMENT__ 16 2025-05-07T19:46:31.5046854Z #define _STRING_H 1 2025-05-07T19:46:31.5046968Z #define _BITS_PTHREADTYPES_H 1 2025-05-07T19:46:31.5047052Z #define _GCC_MAX_ALIGN_T 2025-05-07T19:46:31.5047144Z #define __SM_32_INTRINSICS_HPP__ 2025-05-07T19:46:31.5047295Z #define __SIG_ATOMIC_MIN__ (-__SIG_ATOMIC_MAX__ - 1) 2025-05-07T19:46:31.5047383Z #define __code_model_small__ 1 2025-05-07T19:46:31.5047469Z #define _PSTL_CONFIG_H 2025-05-07T19:46:31.5047579Z #define __GCC_ATOMIC_LONG_LOCK_FREE 2 2025-05-07T19:46:31.5047694Z #define __cpp_nontype_template_args 201411L 2025-05-07T19:46:31.5047784Z #define __SM_20_INTRINSICS_H__ 2025-05-07T19:46:31.5047878Z #define cudaCpuDeviceId ((int)-1) 2025-05-07T19:46:31.5048234Z #define assert(expr) ((expr) ? __ASSERT_VOID_CAST (0) : __assert_fail (__STRING(expr), __FILE__, __LINE__, __ASSERT_FUNCTION)) 2025-05-07T19:46:31.5048326Z #define __DEC32_MANT_DIG__ 7 2025-05-07T19:46:31.5048412Z #define le64toh(x) (x) 2025-05-07T19:46:31.5048514Z #define FILENAME_MAX 4096 2025-05-07T19:46:31.5048661Z #define __iscntrl_l(c,l) __isctype_l((c), _IScntrl, (l)) 2025-05-07T19:46:31.5048769Z #define __cpp_return_type_deduction 201304L 2025-05-07T19:46:31.5048849Z #define L_cuserid 9 2025-05-07T19:46:31.5048945Z #define __ino_t_defined 2025-05-07T19:46:31.5049023Z #define __k8__ 1 2025-05-07T19:46:31.5049114Z #define __INTPTR_TYPE__ long int 2025-05-07T19:46:31.5049230Z #define __UINT16_TYPE__ short unsigned int 2025-05-07T19:46:31.5049315Z #define __int8_t_defined 2025-05-07T19:46:31.5049400Z #define __WCHAR_TYPE__ int 2025-05-07T19:46:31.5049493Z #define __CLOCKID_T_TYPE __S32_TYPE 2025-05-07T19:46:31.5049612Z #define cudaHostRegisterPortable 0x01 2025-05-07T19:46:31.5049705Z #define __SLONGWORD_TYPE long int 2025-05-07T19:46:31.5049818Z #define _GLIBCXX_PACKAGE_TARNAME "libstdc++" 2025-05-07T19:46:31.5049976Z #define __isblank_l(c,l) __isctype_l((c), _ISblank, (l)) 2025-05-07T19:46:31.5050061Z #define __HAVE_COLUMN 2025-05-07T19:46:31.5050145Z #define __stub_fdetach 2025-05-07T19:46:31.5050562Z #define __CUDACC_VER__ "__CUDACC_VER__ is no longer supported. Use __CUDACC_VER_MAJOR__, __CUDACC_VER_MINOR__, and __CUDACC_VER_BUILD__ instead." 2025-05-07T19:46:31.5050655Z #define __pic__ 2 2025-05-07T19:46:31.5050785Z #define __UINTPTR_MAX__ 0xffffffffffffffffUL 2025-05-07T19:46:31.5050893Z #define CLOCKS_PER_SEC 1000000l 2025-05-07T19:46:31.5051008Z #define __INT_FAST64_WIDTH__ 64 2025-05-07T19:46:31.5051120Z #define _GLIBCXX_HAVE_SOCKATMARK 1 2025-05-07T19:46:31.5051217Z #define __stub_chflags 2025-05-07T19:46:31.5051336Z #define CLOCK_BOOTTIME 7 2025-05-07T19:46:31.5051431Z #define __need_IOV_MAX 2025-05-07T19:46:31.5051552Z #define putc(_ch,_fp) _IO_putc (_ch, _fp) 2025-05-07T19:46:31.5051664Z #define __UQUAD_TYPE unsigned long int 2025-05-07T19:46:31.5051797Z #define __cpp_decltype 200707L 2025-05-07T19:46:31.5051908Z #define __BYTE_ORDER __LITTLE_ENDIAN 2025-05-07T19:46:31.5052009Z #define _GLIBCXX_USE_C99 1 2025-05-07T19:46:31.5052136Z #define _GLIBCXX_TR1_BETA_FUNCTION_TCC 1 2025-05-07T19:46:31.5052229Z #define TTY_NAME_MAX 32 2025-05-07T19:46:31.5052405Z #define _GLIBCXX_FORWARD(_Tp,__val) std::forward<_Tp>(__val) 2025-05-07T19:46:31.5052538Z #define __INT_FAST64_MAX__ 0x7fffffffffffffffL 2025-05-07T19:46:31.5052732Z #define _PSTL_ASSERT(_Condition) __glibcxx_assert(_Condition) 2025-05-07T19:46:31.5052852Z #define __GCC_ATOMIC_TEST_AND_SET_TRUEVAL 1 2025-05-07T19:46:31.5052948Z #define __LITTLE_ENDIAN 1234 2025-05-07T19:46:31.5053045Z #define STA_PPSTIME 0x0004 2025-05-07T19:46:31.5053126Z #define __import__ 2025-05-07T19:46:31.5053212Z #define BUFSIZ _IO_BUFSIZ 2025-05-07T19:46:31.5053341Z #define M_SQRT2l 1.414213562373095048801688724209698079L 2025-05-07T19:46:31.5053428Z #define __export__ 2025-05-07T19:46:31.5053542Z #define __FSID_T_TYPE struct { int __val[2]; } 2025-05-07T19:46:31.5053694Z #define cudaMemAttachHost 0x02 2025-05-07T19:46:31.5053915Z #define __FLT_NORM_MAX__ 3.40282346638528859811704183484516925e+38F 2025-05-07T19:46:31.5054008Z #define _GLIBCXX_HAVE_ICONV 1 2025-05-07T19:46:31.5054095Z #define _GLIBCXX_SYMVER 1 2025-05-07T19:46:31.5054187Z #define __FLT64X_MAX_EXP__ 16384 2025-05-07T19:46:31.5054283Z #define _WCHAR_T_DECLARED 2025-05-07T19:46:31.5054398Z #define __UINT_FAST64_TYPE__ long unsigned int 2025-05-07T19:46:31.5054511Z #define isalpha_l(c,l) __isalpha_l ((c), (l)) 2025-05-07T19:46:31.5054625Z #define __cpp_inline_variables 201606L 2025-05-07T19:46:31.5054712Z #define WNOWAIT 0x01000000 2025-05-07T19:46:31.5054802Z #define PLOSS 6 2025-05-07T19:46:31.5054903Z #define M_LN10 2.30258509299404568402 2025-05-07T19:46:31.5055199Z #define _PSTL_UDS_PRESENT (__INTEL_COMPILER >= 1900 && __INTEL_COMPILER_BUILD_DATE >= 20180626) 2025-05-07T19:46:31.5055293Z #define EXIT_SUCCESS 0 2025-05-07T19:46:31.5055398Z #define __LDBL_REDIR_DECL(name) 2025-05-07T19:46:31.5055524Z #define _GLIBCXX_HAVE_STRTOF 1 2025-05-07T19:46:31.5055627Z #define MOD_FREQUENCY ADJ_FREQUENCY 2025-05-07T19:46:31.5055728Z #define __thread__ __thread 2025-05-07T19:46:31.5055856Z #define _GLIBCXX_HAVE_MEMORY_H 1 2025-05-07T19:46:31.5055949Z #define __INT_MAX__ 0x7fffffff 2025-05-07T19:46:31.5056064Z #define __SIZEOF_PTHREAD_BARRIER_T 32 2025-05-07T19:46:31.5056306Z #define __glibcxx_requires_partitioned_upper_pred(_First,_Last,_Value,_Pred) 2025-05-07T19:46:31.5056454Z #define __cudaCDP2StreamWaitEvent_ptsz 2025-05-07T19:46:31.5056550Z #define _GLIBCXX_HAVE_SINF 1 2025-05-07T19:46:31.5056643Z #define __linux__ 1 2025-05-07T19:46:31.5056775Z #define STA_PPSSIGNAL 0x0100 2025-05-07T19:46:31.5056898Z #define M_LN2l 0.693147180559945309417232121458176568L 2025-05-07T19:46:31.5056988Z #define __S16_TYPE short int 2025-05-07T19:46:31.5057351Z #define __glibcxx_constexpr_assert(cond) if (__builtin_is_constant_evaluated() && !bool(cond)) __builtin_unreachable() 2025-05-07T19:46:31.5057500Z #define __NVCC_DIAG_PRAGMA_SUPPORT__ 1 2025-05-07T19:46:31.5057694Z #define __bos(ptr) __builtin_object_size (ptr, __USE_FORTIFY_LEVEL > 1) 2025-05-07T19:46:31.5057797Z #define __COMMON_FUNCTIONS_H__ 2025-05-07T19:46:31.5057931Z #define UINT_MAX (INT_MAX * 2U + 1U) 2025-05-07T19:46:31.5058019Z #define _T_SIZE_ 2025-05-07T19:46:31.5058127Z #define LLONG_MAX __LONG_LONG_MAX__ 2025-05-07T19:46:31.5058256Z #define __cudaCDP2StreamCreateWithFlags 2025-05-07T19:46:31.5058387Z #define _PSTL_VERSION 12000 2025-05-07T19:46:31.5058510Z #define __noinline__ __attribute__((noinline)) 2025-05-07T19:46:31.5058609Z #define __WNOTHREAD 0x20000000 2025-05-07T19:46:31.5058743Z #define _G_va_list __gnuc_va_list 2025-05-07T19:46:31.5058878Z #define M_PI_4l 0.785398163397448309615660845819875721L 2025-05-07T19:46:31.5058965Z #define _IOS_INPUT 1 2025-05-07T19:46:31.5059081Z #define __USE_LARGEFILE64 1 2025-05-07T19:46:31.5059191Z #define _GLIBCXX_TR1_EXP_INTEGRAL_TCC 1 2025-05-07T19:46:31.5059374Z #define __INT64_TYPE__ long int 2025-05-07T19:46:31.5059486Z #define _POSIX_SSIZE_MAX 32767 2025-05-07T19:46:31.5059614Z #define __shared__ __location__(shared) 2025-05-07T19:46:31.5059890Z #define __FLT_MAX_EXP__ 128 2025-05-07T19:46:31.5060062Z #define __glibc_unlikely(cond) __builtin_expect((cond), 0) 2025-05-07T19:46:31.5060190Z #define __gid_t_defined 2025-05-07T19:46:31.5060320Z #define _GLIBCXX_USE_SC_NPROCESSORS_ONLN 1 2025-05-07T19:46:31.5060435Z #define __ORDER_BIG_ENDIAN__ 4321 2025-05-07T19:46:31.5060676Z #define __glibcxx_requires_can_increment_range(_First1,_Last1,_First2) 2025-05-07T19:46:31.5060810Z #define _GLIBCXX17_INLINE inline 2025-05-07T19:46:31.5060916Z #define __DBL_MANT_DIG__ 53 2025-05-07T19:46:31.5061021Z #define ___int_size_t_h 2025-05-07T19:46:31.5061159Z #define __FSBLKCNT64_T_TYPE __UQUAD_TYPE 2025-05-07T19:46:31.5061299Z #define __cpp_inheriting_constructors 201511L 2025-05-07T19:46:31.5061470Z #define __WIFCONTINUED(status) ((status) == __W_CONTINUED) 2025-05-07T19:46:31.5061584Z #define CUDA_DOUBLE_MATH_FUNCTIONS 1 2025-05-07T19:46:31.5061777Z #define _GLIBCXX_HAVE_FENV_H 1 2025-05-07T19:46:31.5061892Z #define _GLIBCXX_HAVE_STDBOOL_H 1 2025-05-07T19:46:31.5062056Z #define __SIZEOF_FLOAT128__ 16 2025-05-07T19:46:31.5062222Z #define __INT_LEAST64_MAX__ 0x7fffffffffffffffL 2025-05-07T19:46:31.5062347Z #define _GLIBCXX_TR1_HYPERGEOMETRIC_TCC 1 2025-05-07T19:46:31.5062481Z #define _GLIBCXX_DEBUG_PEDASSERT(_Condition) 2025-05-07T19:46:31.5062581Z #define __clock_t_defined 1 2025-05-07T19:46:31.5062709Z #define _POSIX_SEM_VALUE_MAX 32767 2025-05-07T19:46:31.5062828Z #define __cudaCDP2RuntimeGetVersion 2025-05-07T19:46:31.5062932Z #define __GLIBC_MINOR__ 17 2025-05-07T19:46:31.5063050Z #define __DEC64_MIN__ 1E-383DD 2025-05-07T19:46:31.5063157Z #define __WINT_TYPE__ unsigned int 2025-05-07T19:46:31.5063285Z #define __UINT_LEAST32_TYPE__ unsigned int 2025-05-07T19:46:31.5063422Z #define __SIZEOF_SHORT__ 2 2025-05-07T19:46:31.5063617Z #define __FLT32_NORM_MAX__ 3.40282346638528859811704183484516925e+38F32 2025-05-07T19:46:31.5063721Z #define __SSE__ 1 2025-05-07T19:46:31.5063834Z #define SEM_VALUE_MAX (2147483647) 2025-05-07T19:46:31.5063977Z #define M_SQRT1_2 0.70710678118654752440 2025-05-07T19:46:31.5064070Z #define _CTYPE_H 1 2025-05-07T19:46:31.5064181Z #define __sigset_t_defined 2025-05-07T19:46:31.5064325Z #define __LDBL_MIN_EXP__ (-16381) 2025-05-07T19:46:31.5064431Z #define _GLIBCXX_HAVE_LOGF 1 2025-05-07T19:46:31.5064527Z #define MOD_TAI ADJ_TAI 2025-05-07T19:46:31.5064639Z #define _IO_va_list __gnuc_va_list 2025-05-07T19:46:31.5064770Z #define _GLIBCXX_HAVE_LOGL 1 2025-05-07T19:46:31.5064866Z #define __SM_70_RT_H__ 2025-05-07T19:46:31.5064971Z #define _GLIBCXX_HAVE_WRITEV 1 2025-05-07T19:46:31.5065117Z #define cudaEventWaitDefault 0x00 2025-05-07T19:46:31.5065224Z #define _GLIBCXX_HAVE_EXPL 1 2025-05-07T19:46:31.5065389Z #define __FLT64_MAX__ 1.79769313486231570814527423731704357e+308F64 2025-05-07T19:46:31.5065484Z #define _POSIX_MAX_CANON 255 2025-05-07T19:46:31.5065609Z #define _GLIBCXX_NOEXCEPT_PARM , bool _NE 2025-05-07T19:46:31.5065708Z #define FD_SETSIZE __FD_SETSIZE 2025-05-07T19:46:31.5065803Z #define _GLIBCXX_TXN_SAFE 2025-05-07T19:46:31.5065902Z #define __amd64__ 1 2025-05-07T19:46:31.5065994Z #define __WINT_WIDTH__ 32 2025-05-07T19:46:31.5066100Z #define __CUDA_DEVICE_RUNTIME_API_H__ 2025-05-07T19:46:31.5066388Z #define __REDIRECT_NTHNL(name,proto,alias) name proto __THROWNL __asm__ (__ASMNAME (#alias)) 2025-05-07T19:46:31.5066502Z #define _GLIBCXX_STDIO_SEEK_CUR 1 2025-05-07T19:46:31.5066586Z #define EOF (-1) 2025-05-07T19:46:31.5066686Z #define __WAIT_STATUS_DEFN void * 2025-05-07T19:46:31.5066793Z #define __USE_POSIX199309 1 2025-05-07T19:46:31.5066887Z #define __INT_LEAST64_WIDTH__ 64 2025-05-07T19:46:31.5066980Z #define __LDBL_MAX_EXP__ 16384 2025-05-07T19:46:31.5067075Z #define __FLT32X_MAX_10_EXP__ 308 2025-05-07T19:46:31.5067192Z #define LLONG_MIN (-LLONG_MAX-1) 2025-05-07T19:46:31.5067305Z #define cudaSurfaceType2DLayered 0xF2 2025-05-07T19:46:31.5067403Z #define ____mbstate_t_defined 1 2025-05-07T19:46:31.5067513Z #define STA_NANO 0x2000 2025-05-07T19:46:31.5067608Z #define _GLIBCXX_HAVE_LOG10F 1 2025-05-07T19:46:31.5067710Z #define _GLIBCXX_HAVE_LOG10L 1 2025-05-07T19:46:31.5067799Z #define _IO_LINKED 0x80 2025-05-07T19:46:31.5067909Z #define __cpp_lib_launder 201606 2025-05-07T19:46:31.5068000Z #define __SIZEOF_INT128__ 16 2025-05-07T19:46:31.5068096Z #define __PTHREAD_MUTEX_HAVE_PREV 1 2025-05-07T19:46:31.5068200Z #define __FLT64X_IS_IEC_60559__ 2 2025-05-07T19:46:31.5068296Z #define _GLIBCXX_TYPE_TRAITS 1 2025-05-07T19:46:31.5068441Z #define cudaGraphKernelNodePortProgrammatic 1 2025-05-07T19:46:31.5068555Z #define __DEVICE_ATOMIC_FUNCTIONS_HPP__ 2025-05-07T19:46:31.5068673Z #define __BLKCNT64_T_TYPE __SQUAD_TYPE 2025-05-07T19:46:31.5068765Z #define __LDBL_MAX_10_EXP__ 4932 2025-05-07T19:46:31.5068866Z #define __W_CONTINUED 0xffff 2025-05-07T19:46:31.5068986Z #define __ATOMIC_RELAXED 0 2025-05-07T19:46:31.5069125Z #define w_coredump __wait_terminated.__w_coredump 2025-05-07T19:46:31.5069304Z #define __FSBLKCNT_T_TYPE __SYSCALL_ULONG_TYPE 2025-05-07T19:46:31.5069614Z #define __cudaCDP2OccupancyMaxActiveBlocksPerMultiprocessor 2025-05-07T19:46:31.5069832Z #define __DBL_EPSILON__ double(2.22044604925031308084726333618164062e-16L) 2025-05-07T19:46:31.5069916Z #define __stub_stty 2025-05-07T19:46:31.5070092Z #define _tolower(c) ((int) (*__ctype_tolower_loc ())[(int) (c)]) 2025-05-07T19:46:31.5070198Z #define le16toh(x) (x) 2025-05-07T19:46:31.5070308Z #define BC_SCALE_MAX _POSIX2_BC_SCALE_MAX 2025-05-07T19:46:31.5070491Z #define __FLT128_MIN__ 3.36210314311209350626267781732175260e-4932F128 2025-05-07T19:46:31.5070587Z #define _SIZET_ 2025-05-07T19:46:31.5070683Z #define XATTR_NAME_MAX 255 2025-05-07T19:46:31.5070771Z #define _SVID_SOURCE 1 2025-05-07T19:46:31.5070855Z #define _LP64 1 2025-05-07T19:46:31.5070962Z #define _LIBC_LIMITS_H_ 1 2025-05-07T19:46:31.5071218Z #define __REDIRECT_NTH_LDBL(name,proto,alias) __REDIRECT_NTH (name, proto, alias) 2025-05-07T19:46:31.5071332Z #define _GLIBCXX_TR1_BESSEL_FUNCTION_TCC 1 2025-05-07T19:46:31.5071435Z #define __UINT8_C(c) c 2025-05-07T19:46:31.5071538Z #define _GLIBCXX_HAVE_CEILF 1 2025-05-07T19:46:31.5071633Z #define _GLIBCXX_HAVE_CEILL 1 2025-05-07T19:46:31.5071745Z #define __cudaCDP2Memset3DAsync_ptsz 2025-05-07T19:46:31.5071853Z #define __CUDA_ARCH_LIST__ 520 2025-05-07T19:46:31.5072057Z #define __FLT64_MAX_EXP__ 1024 2025-05-07T19:46:31.5072152Z #define MOD_MAXERROR ADJ_MAXERROR 2025-05-07T19:46:31.5072244Z #define CUDARTAPI 2025-05-07T19:46:31.5072323Z #define IOV_MAX 1024 2025-05-07T19:46:31.5072467Z #define __glibcxx_requires_irreflexive2(_First,_Last) 2025-05-07T19:46:31.5072558Z #define __INT_LEAST32_TYPE__ int 2025-05-07T19:46:31.5072665Z #define P_tmpdir "/tmp" 2025-05-07T19:46:31.5072764Z #define cudaMemAttachSingle 0x04 2025-05-07T19:46:31.5072839Z #define __wchar_t__ 2025-05-07T19:46:31.5072953Z #define __cpp_lib_is_aggregate 201703 2025-05-07T19:46:31.5073030Z #define SEEK_END 2 2025-05-07T19:46:31.5073119Z #define __SIZEOF_WCHAR_T__ 4 2025-05-07T19:46:31.5073288Z #define _GLIBCXX_USE_TBB_PAR_BACKEND __has_include() 2025-05-07T19:46:31.5073396Z #define _IO_ftrylockfile(_fp) 2025-05-07T19:46:31.5073538Z #define _GLIBCXX_USE_C99_WCHAR _GLIBCXX11_USE_C99_WCHAR 2025-05-07T19:46:31.5073625Z #define ____FILE_defined 1 2025-05-07T19:46:31.5073747Z #define _GLIBCXX_HAVE_BUILTIN_IS_AGGREGATE 1 2025-05-07T19:46:31.5073839Z #define __GNUC_PATCHLEVEL__ 0 2025-05-07T19:46:31.5073925Z #define _ISOC99_SOURCE 1 2025-05-07T19:46:31.5074016Z #define __VECTOR_FUNCTIONS_H__ 2025-05-07T19:46:31.5074274Z #define __REDIRECT_NTH(name,proto,alias) name proto __THROW __asm__ (__ASMNAME (#alias)) 2025-05-07T19:46:31.5074398Z #define _PSTL_USE_NONTEMPORAL_STORES_IF_ALLOWED 2025-05-07T19:46:31.5074487Z #define _IO_RIGHT 04 2025-05-07T19:46:31.5074593Z #define __END_NAMESPACE_STD 2025-05-07T19:46:31.5074775Z #define __FLT128_NORM_MAX__ 1.18973149535723176508575932662800702e+4932F128 2025-05-07T19:46:31.5074864Z #define _GLIBCXX_STD_C std 2025-05-07T19:46:31.5074996Z #define cudaInitDeviceFlagsAreValid 0x01 2025-05-07T19:46:31.5075085Z #define _LARGEFILE64_SOURCE 1 2025-05-07T19:46:31.5075185Z #define _GLIBCXX_USE_C99_STDINT_TR1 1 2025-05-07T19:46:31.5075267Z #define _STDDEF_H_ 2025-05-07T19:46:31.5075443Z #define __FLT64_NORM_MAX__ 1.79769313486231570814527423731704357e+308F64 2025-05-07T19:46:31.5075539Z #define __FLT128_HAS_QUIET_NAN__ 1 2025-05-07T19:46:31.5075651Z #define isalnum_l(c,l) __isalnum_l ((c), (l)) 2025-05-07T19:46:31.5075862Z #define __FD_ISSET(d,set) ((__FDS_BITS (set)[__FD_ELT (d)] & __FD_MASK (d)) != 0) 2025-05-07T19:46:31.5075968Z #define __INTMAX_MAX__ 0x7fffffffffffffffL 2025-05-07T19:46:31.5076106Z #define __glibcxx_requires_irreflexive(_First,_Last) 2025-05-07T19:46:31.5076224Z #define cudaGraphKernelNodePortDefault 0 2025-05-07T19:46:31.5076335Z #define __INT_FAST8_TYPE__ signed char 2025-05-07T19:46:31.5076439Z #define __cudaCDP2Memcpy3DAsync_ptsz 2025-05-07T19:46:31.5076530Z #define __PID_T_TYPE __S32_TYPE 2025-05-07T19:46:31.5076704Z #define __cpp_namespace_attributes 201411L 2025-05-07T19:46:31.5076798Z #define CHARCLASS_NAME_MAX 2048 2025-05-07T19:46:31.5076939Z #define _GLIBCXX_HAVE_TANF 1 2025-05-07T19:46:31.5077032Z #define _GLIBCXX_USE_ST_MTIM 1 2025-05-07T19:46:31.5077219Z #define __FLT64X_MIN__ 3.36210314311209350626267781732175260e-4932F64x 2025-05-07T19:46:31.5077305Z #define __CUDA_RUNTIME_H__ 2025-05-07T19:46:31.5077482Z #define WIFSIGNALED(status) __WIFSIGNALED (__WAIT_INT (status)) 2025-05-07T19:46:31.5077593Z #define _GLIBCXX_HAVE_STDLIB_H 1 2025-05-07T19:46:31.5077685Z #define __STDCPP_THREADS__ 1 2025-05-07T19:46:31.5077824Z #define M_2_SQRTPIl 1.128379167095512573896158903121545172L 2025-05-07T19:46:31.5077928Z #define __GNUC_STDC_INLINE__ 1 2025-05-07T19:46:31.5078018Z #define _POSIX_UIO_MAXIOV 16 2025-05-07T19:46:31.5078114Z #define _PSTL_PAR_BACKEND_SERIAL 2025-05-07T19:46:31.5078228Z #define __ASSERT_FUNCTION __PRETTY_FUNCTION__ 2025-05-07T19:46:31.5078330Z #define __FLT64_HAS_DENORM__ 1 2025-05-07T19:46:31.5078428Z #define __WORDSIZE_TIME64_COMPAT32 1 2025-05-07T19:46:31.5078591Z #define _GLIBCXX_DEPRECATED __attribute__ ((__deprecated__)) 2025-05-07T19:46:31.5078770Z #define __FLT32_EPSILON__ 1.19209289550781250000000000000000000e-7F32 2025-05-07T19:46:31.5078865Z #define _PSTL_HIDE_FROM_ABI_PUSH 2025-05-07T19:46:31.5078979Z #define cudaStreamLegacy ((cudaStream_t)0x1) 2025-05-07T19:46:31.5079087Z #define _IO_cleanup_region_start(_fct,_fp) 2025-05-07T19:46:31.5079199Z #define __location__(a) __annotate__(a) 2025-05-07T19:46:31.5079430Z #define __device_builtin_surface_type__ __location__(device_builtin_surface_type) 2025-05-07T19:46:31.5079521Z #define _POSIX2_BC_BASE_MAX 99 2025-05-07T19:46:31.5079642Z #define __cudaCDP2DeviceGetAttribute 2025-05-07T19:46:31.5079731Z #define __DBL_DECIMAL_DIG__ 17 2025-05-07T19:46:31.5079817Z #define __STDC_UTF_32__ 1 2025-05-07T19:46:31.5079926Z #define __INT_FAST8_WIDTH__ 8 2025-05-07T19:46:31.5080020Z #define NAN (__builtin_nanf ("")) 2025-05-07T19:46:31.5080109Z #define _POSIX_MQ_PRIO_MAX 32 2025-05-07T19:46:31.5080188Z #define __FXSR__ 1 2025-05-07T19:46:31.5080275Z #define _SIZE_T 2025-05-07T19:46:31.5080377Z #define _GLIBCXX_USE_GETTIMEOFDAY 1 2025-05-07T19:46:31.5080484Z #define cudaHostRegisterReadOnly 0x08 2025-05-07T19:46:31.5080666Z #define __FLT32X_MAX__ 1.79769313486231570814527423731704357e+308F32x 2025-05-07T19:46:31.5080810Z #define __WIFSTOPPED(status) (((status) & 0xff) == 0x7f) 2025-05-07T19:46:31.5080898Z #define _IO_ssize_t __ssize_t 2025-05-07T19:46:31.5080991Z #define __ULONG32_TYPE unsigned int 2025-05-07T19:46:31.5081185Z #define __DBL_NORM_MAX__ double(1.79769313486231570814527423731704357e+308L) 2025-05-07T19:46:31.5081381Z #define cudaStreamGraphTailLaunch (cudaStream_t)0x0100000000000000 2025-05-07T19:46:31.5081467Z #define _GXX_NULLPTR_T 2025-05-07T19:46:31.5081597Z #define __glibcxx_class_requires3(_a,_b,_c,_d) 2025-05-07T19:46:31.5081680Z #define FOPEN_MAX 16 2025-05-07T19:46:31.5081769Z #define __BIG_ENDIAN 4321 2025-05-07T19:46:31.5081884Z #define __BYTE_ORDER__ __ORDER_LITTLE_ENDIAN__ 2025-05-07T19:46:31.5081989Z #define __suseconds_t_defined 2025-05-07T19:46:31.5082078Z #define __off_t_defined 2025-05-07T19:46:31.5082158Z #define stderr stderr 2025-05-07T19:46:31.5082256Z #define M_LOG10E 0.43429448190325182765 2025-05-07T19:46:31.5082368Z #define __glibcxx_requires_string(_String) 2025-05-07T19:46:31.5082467Z #define _GLIBCXX_HAVE_LDEXPL 1 2025-05-07T19:46:31.5082554Z #define __INTMAX_WIDTH__ 64 2025-05-07T19:46:31.5082986Z #define _PSTL_CPP14_2RANGE_MISMATCH_EQUAL_PRESENT (_MSC_VER >= 1900 || __cplusplus >= 201300L || __cpp_lib_robust_nonmodifying_seq_ops == 201304) 2025-05-07T19:46:31.5083072Z #define __mode_t_defined 2025-05-07T19:46:31.5083156Z #define _GCC_SIZE_T 2025-05-07T19:46:31.5083259Z #define __INO64_T_TYPE __UQUAD_TYPE 2025-05-07T19:46:31.5083359Z #define __cpp_runtime_arrays 198712L 2025-05-07T19:46:31.5083463Z #define __UINT64_TYPE__ long unsigned int 2025-05-07T19:46:31.5083564Z #define __USE_XOPEN2K8XSI 1 2025-05-07T19:46:31.5083703Z #define __UINT32_C(c) c ## U 2025-05-07T19:46:31.5083804Z #define __cpp_alias_templates 200704L 2025-05-07T19:46:31.5083955Z #define cudaHostAllocMapped 0x02 2025-05-07T19:46:31.5084066Z #define __DEVICE_LAUNCH_PARAMETERS_H__ 2025-05-07T19:46:31.5084155Z #define _STL_ITERATOR_H 1 2025-05-07T19:46:31.5084237Z #define __size_t__ 2025-05-07T19:46:31.5084373Z #define cudaStreamAttrID cudaLaunchAttributeID 2025-05-07T19:46:31.5084462Z #define _GLIBCXX_HAVE_ATANF 1 2025-05-07T19:46:31.5084564Z #define cudaEventRecordExternal 0x01 2025-05-07T19:46:31.5084710Z #define __isspace_l(c,l) __isctype_l((c), _ISspace, (l)) 2025-05-07T19:46:31.5084802Z #define _IO_BUFSIZ _G_BUFSIZ 2025-05-07T19:46:31.5084974Z #define __FLT_DENORM_MIN__ 1.40129846432481707092372958328991613e-45F 2025-05-07T19:46:31.5085055Z #define _ENDIAN_H 1 2025-05-07T19:46:31.5085162Z #define __builtin_align__(a) __align__(a) 2025-05-07T19:46:31.5085257Z #define _GLIBCXX20_CONSTEXPR 2025-05-07T19:46:31.5085357Z #define __NV_NO_HOST_COMPILER_CHECK 1 2025-05-07T19:46:31.5085446Z #define __try try 2025-05-07T19:46:31.5085537Z #define _GLIBCXX_HAVE_FINITE 1 2025-05-07T19:46:31.5085626Z #define __FLT128_IS_IEC_60559__ 2 2025-05-07T19:46:31.5085711Z #define __INT8_MAX__ 0x7f 2025-05-07T19:46:31.5085978Z #define cudaStreamGetCaptureInfo __CUDART_API_PTSZ(cudaStreamGetCaptureInfo_v2) 2025-05-07T19:46:31.5086062Z #define __LONG_WIDTH__ 64 2025-05-07T19:46:31.5086140Z #define __PIC__ 2 2025-05-07T19:46:31.5086258Z #define BC_STRING_MAX _POSIX2_BC_STRING_MAX 2025-05-07T19:46:31.5086370Z #define __UINT_FAST32_TYPE__ long unsigned int 2025-05-07T19:46:31.5086493Z #define FD_ISSET(fd,fdsetp) __FD_ISSET (fd, fdsetp) 2025-05-07T19:46:31.5086582Z #define _GLIBCXX_HAVE_FLOAT_H 1 2025-05-07T19:46:31.5086687Z #define _GLIBCXX_HAVE_ATANL 1 2025-05-07T19:46:31.5086865Z #define __FLT32X_NORM_MAX__ 1.79769313486231570814527423731704357e+308F32x 2025-05-07T19:46:31.5086959Z #define __DEVICE_FUNCTIONS_HPP__ 2025-05-07T19:46:31.5087064Z #define __CHAR32_TYPE__ unsigned int 2025-05-07T19:46:31.5087150Z #define _IO_uid_t __uid_t 2025-05-07T19:46:31.5087243Z #define _GLIBCXX_HAVE_READLINK 1 2025-05-07T19:46:31.5087382Z #define __cudaCDP2EventRecordWithFlags_ptsz 2025-05-07T19:46:31.5087469Z #define _CONCEPT_CHECK_H 1 2025-05-07T19:46:31.5087610Z #define __FLT_MAX__ 3.40282346638528859811704183484516925e+38F 2025-05-07T19:46:31.5087707Z #define _GLIBCXX_HAVE_NETINET_IN_H 1 2025-05-07T19:46:31.5087835Z #define _GLIBCXX_TR1_SPECIAL_FUNCTION_UTIL_H 1 2025-05-07T19:46:31.5087915Z #define LONG_BIT 64 2025-05-07T19:46:31.5088023Z #define __SIZEOF_PTHREAD_BARRIERATTR_T 4 2025-05-07T19:46:31.5088130Z #define _GLIBCXX_USE_ALLOCATOR_NEW 1 2025-05-07T19:46:31.5088251Z #define __cpp_lib_math_special_functions 201603L 2025-05-07T19:46:31.5088338Z #define __fsfilcnt_t_defined 2025-05-07T19:46:31.5088427Z #define __blkcnt_t_defined 2025-05-07T19:46:31.5088707Z #define cudaKernelNodeAttributeMemSyncDomain cudaLaunchAttributeMemSyncDomain 2025-05-07T19:46:31.5088792Z #define __USE_LARGEFILE 1 2025-05-07T19:46:31.5088889Z #define __cpp_constexpr 201603L 2025-05-07T19:46:31.5088995Z #define CUDART_VERSION 12080 2025-05-07T19:46:31.5089084Z #define NL_TEXTMAX INT_MAX 2025-05-07T19:46:31.5089180Z #define cudaDeviceMapHost 0x08 2025-05-07T19:46:31.5089263Z #define _GLIBCXX_CMATH 1 2025-05-07T19:46:31.5089470Z #define __attribute_format_arg__(x) __attribute__ ((__format_arg__ (x))) 2025-05-07T19:46:31.5089556Z #define __lldiv_t_defined 1 2025-05-07T19:46:31.5089633Z #define __SSE2__ 1 2025-05-07T19:46:31.5089722Z #define _IOLBF 1 2025-05-07T19:46:31.5089816Z #define _GLIBCXX_HAVE_SYS_TYPES_H 1 2025-05-07T19:46:31.5089907Z #define _GLIBCXX_HAVE_FLOORF 1 2025-05-07T19:46:31.5090007Z #define __cpp_deduction_guides 201703L 2025-05-07T19:46:31.5090113Z #define _GLIBCXX_HAVE_EXPF 1 2025-05-07T19:46:31.5090215Z #define __annotate__(a) __attribute__((a)) 2025-05-07T19:46:31.5090298Z #define __INT32_TYPE__ int 2025-05-07T19:46:31.5090398Z #define __SIZEOF_DOUBLE__ 8 2025-05-07T19:46:31.5090498Z #define cudaDeviceSyncMemops 0x80 2025-05-07T19:46:31.5090648Z #define __cpp_exceptions 199711L 2025-05-07T19:46:31.5090795Z #define __FLT_MIN_10_EXP__ (-37) 2025-05-07T19:46:31.5090915Z #define cudaDeviceScheduleYield 0x02 2025-05-07T19:46:31.5091001Z #define _SYS_SYSMACROS_H 1 2025-05-07T19:46:31.5091112Z #define _GLIBCXX_TR1_LEGENDRE_FUNCTION_TCC 1 2025-05-07T19:46:31.5091281Z #define __FLT64_MIN__ 2.22507385850720138309023271733240406e-308F64 2025-05-07T19:46:31.5091372Z #define __INT_LEAST32_WIDTH__ 32 2025-05-07T19:46:31.5091461Z #define __SWORD_TYPE long int 2025-05-07T19:46:31.5091547Z #define __INTMAX_TYPE__ long int 2025-05-07T19:46:31.5091651Z #define _GLIBCXX11_USE_C99_MATH 1 2025-05-07T19:46:31.5091741Z #define __PTHREAD_SPINS 0, 0 2025-05-07T19:46:31.5091828Z #define _BITS_POSIX1_LIM_H 1 2025-05-07T19:46:31.5092122Z #define cudaStreamAttributeMemSyncDomainMap cudaLaunchAttributeMemSyncDomainMap 2025-05-07T19:46:31.5092215Z #define __DEC128_MAX_EXP__ 6145 2025-05-07T19:46:31.5092358Z #define math_errhandling (MATH_ERRNO | MATH_ERREXCEPT) 2025-05-07T19:46:31.5092450Z #define _T_SIZE 2025-05-07T19:46:31.5092553Z #define cudaHostAllocDefault 0x00 2025-05-07T19:46:31.5092674Z #define _PSTL_PRAGMA_SIMD_EXCLUSIVE_SCAN(PRM) 2025-05-07T19:46:31.5092792Z #define __va_arg_pack() __builtin_va_arg_pack () 2025-05-07T19:46:31.5092893Z #define _POSIX_TIMER_MAX 32 2025-05-07T19:46:31.5092981Z #define _GLIBCXX_HAVE_TLS 1 2025-05-07T19:46:31.5093097Z #define _GLIBCXX_NOTHROW _GLIBCXX_USE_NOEXCEPT 2025-05-07T19:46:31.5093199Z #define __FLT32X_HAS_QUIET_NAN__ 1 2025-05-07T19:46:31.5093285Z #define __ATOMIC_CONSUME 1 2025-05-07T19:46:31.5093457Z #define __CUDA_ARCH_HAS_FEATURE__(_FEAT) __CUDA_ARCH_FEAT_ ##_FEAT 2025-05-07T19:46:31.5093539Z #define __GNUC_MINOR__ 4 2025-05-07T19:46:31.5093647Z #define __GLIBCXX_TYPE_INT_N_0 __int128 2025-05-07T19:46:31.5093738Z #define __INT_FAST16_WIDTH__ 64 2025-05-07T19:46:31.5093849Z #define __UINTMAX_MAX__ 0xffffffffffffffffUL 2025-05-07T19:46:31.5093935Z #define __PIE__ 2 2025-05-07T19:46:31.5094037Z #define LITTLE_ENDIAN __LITTLE_ENDIAN 2025-05-07T19:46:31.5094132Z #define _GLIBCXX_HAVE_INT64_T_LONG 1 2025-05-07T19:46:31.5094321Z #define __FLT32X_DENORM_MIN__ 4.94065645841246544176568792868221372e-324F32x 2025-05-07T19:46:31.5094551Z #define __intN_t(N,MODE) typedef int int ##N ##_t __attribute__ ((__mode__ (MODE))) 2025-05-07T19:46:31.5094637Z #define __nlink_t_defined 2025-05-07T19:46:31.5094760Z #define _GLIBCXX17_DEPRECATED [[__deprecated__]] 2025-05-07T19:46:31.5094879Z #define _PSTL_STRING(x) _PSTL_STRING_AUX(x) 2025-05-07T19:46:31.5094962Z #define _XOPEN_LIM_H 1 2025-05-07T19:46:31.5095221Z #define __u_intN_t(N,MODE) typedef unsigned int u_int ##N ##_t __attribute__ ((__mode__ (MODE))) 2025-05-07T19:46:31.5095348Z #define __cpp_template_template_args 201611L 2025-05-07T19:46:31.5095453Z #define _GTHREAD_USE_MUTEX_TIMEDLOCK 1 2025-05-07T19:46:31.5095548Z #define BC_DIM_MAX _POSIX2_BC_DIM_MAX 2025-05-07T19:46:31.5095633Z #define __DBL_MAX_10_EXP__ 308 2025-05-07T19:46:31.5095731Z #define __FILE_defined 1 2025-05-07T19:46:31.5095906Z #define __LDBL_DENORM_MIN__ 3.64519953188247460252840593361941982e-4951L 2025-05-07T19:46:31.5096001Z #define _GLIBCXX_HAVE_SINCOS 1 2025-05-07T19:46:31.5096102Z #define __USE_XOPEN_EXTENDED 1 2025-05-07T19:46:31.5096207Z #define __cpp_lib_tuple_element_t 201402L 2025-05-07T19:46:31.5096319Z #define isascii_l(c,l) __isascii_l ((c), (l)) 2025-05-07T19:46:31.5096420Z #define cudaInvalidDeviceId ((int)-2) 2025-05-07T19:46:31.5096529Z #define _GLIBCXX_HAVE_SYS_RESOURCE_H 1 2025-05-07T19:46:31.5096610Z #define __INT16_C(c) c 2025-05-07T19:46:31.5096698Z #define __U32_TYPE unsigned int 2025-05-07T19:46:31.5096803Z #define _GLIBCXX_HAVE_SYS_IOCTL_H 1 2025-05-07T19:46:31.5096920Z #define FD_CLR(fd,fdsetp) __FD_CLR (fd, fdsetp) 2025-05-07T19:46:31.5096997Z #define __STDC__ 1 2025-05-07T19:46:31.5097091Z #define _GLIBCXX_HAVE_VWSCANF 1 2025-05-07T19:46:31.5097195Z #define _GLIBCXX_HAVE_EXECINFO_H 1 2025-05-07T19:46:31.5097286Z #define _GLIBCXX_USE_REALPATH 1 2025-05-07T19:46:31.5097508Z #define __attribute_malloc__ __attribute__ ((__malloc__)) 2025-05-07T19:46:31.5097600Z #define __FLT32X_DIG__ 15 2025-05-07T19:46:31.5097744Z #define _GLIBCXX_USE_C99_CTYPE_TR1 1 2025-05-07T19:46:31.5097838Z #define __PTRDIFF_TYPE__ long int 2025-05-07T19:46:31.5097945Z #define cudaArrayDeferredMapping 0x80 2025-05-07T19:46:31.5098062Z #define _GLIBCXX_END_NAMESPACE_CONTAINER 2025-05-07T19:46:31.5098151Z #define USHRT_MAX (SHRT_MAX * 2 + 1) 2025-05-07T19:46:31.5098249Z #define __cpp_lib_is_swappable 201603 2025-05-07T19:46:31.5098341Z #define stdin stdin 2025-05-07T19:46:31.5098424Z #define __ino64_t_defined 2025-05-07T19:46:31.5098505Z #define STA_CLK 0x8000 2025-05-07T19:46:31.5098606Z #define __clockid_t_defined 1 2025-05-07T19:46:31.5098746Z #define _GLIBCXX_NOEXCEPT_IF(...) noexcept(__VA_ARGS__) 2025-05-07T19:46:31.5098905Z #define __attribute_noinline__ __attribute__ ((__noinline__)) 2025-05-07T19:46:31.5099001Z #define __cudaCDP2MemsetAsync 2025-05-07T19:46:31.5099112Z #define _PSTL_PRAGMA_SIMD_SCAN(PRM) 2025-05-07T19:46:31.5099214Z #define _GLIBCXX_BEGIN_NAMESPACE_LDBL 2025-05-07T19:46:31.5099383Z #define _GLIBCXX_TR1_POLY_HERMITE_TCC 1 2025-05-07T19:46:31.5099597Z #define __FD_SET(d,set) ((void) (__FDS_BITS (set)[__FD_ELT (d)] |= __FD_MASK (d))) 2025-05-07T19:46:31.5099685Z #define __ATOMIC_SEQ_CST 5 2025-05-07T19:46:31.5100626Z #define __tobody(c,f,a,args) (__extension__ ({ int __res; if (sizeof (c) > 1) { if (__builtin_constant_p (c)) { int __c = (c); __res = __c < -128 || __c > 255 ? __c : (a)[__c]; } else __res = f args; } else __res = (a)[(int) (c)]; __res; })) 2025-05-07T19:46:31.5100729Z #define DOMAIN 1 2025-05-07T19:46:31.5100821Z #define M_LN2 0.69314718055994530942 2025-05-07T19:46:31.5100908Z #define __NVCC__ 1 2025-05-07T19:46:31.5101014Z #define __cudaCDP2Memset2DAsync 2025-05-07T19:46:31.5101144Z #define __CLOCK_T_TYPE __SYSCALL_SLONG_TYPE 2025-05-07T19:46:31.5101250Z #define _PSTL_PRAGMA_SIMD_EARLYEXIT 2025-05-07T19:46:31.5101354Z #define __throw_exception_again throw 2025-05-07T19:46:31.5101464Z #define M_SQRT2 1.41421356237309504880 2025-05-07T19:46:31.5101557Z #define __EXCEPTION_H 1 2025-05-07T19:46:31.5101658Z #define __FLT32X_MIN_10_EXP__ (-307) 2025-05-07T19:46:31.5101763Z #define HUGE_VAL (__builtin_huge_val()) 2025-05-07T19:46:31.5102101Z #define cudaStreamAttributeAccessPolicyWindow cudaLaunchAttributeAccessPolicyWindow 2025-05-07T19:46:31.5102217Z #define __UINTPTR_TYPE__ long unsigned int 2025-05-07T19:46:31.5102320Z #define _GLIBCXX_INLINE_VERSION 0 2025-05-07T19:46:31.5102430Z #define _GLIBCXX_USE_INT128 1 2025-05-07T19:46:31.5102536Z #define __cpp_lib_bool_constant 201505 2025-05-07T19:46:31.5102638Z #define PTHREAD_KEYS_MAX 1024 2025-05-07T19:46:31.5102783Z #define __DEC64_SUBNORMAL_MIN__ 0.000000000000001E-383DD 2025-05-07T19:46:31.5102905Z #define __FSFILCNT64_T_TYPE __UQUAD_TYPE 2025-05-07T19:46:31.5103022Z #define _GLIBCXX_DOUBLE_IS_IEEE_BINARY64 1 2025-05-07T19:46:31.5103117Z #define __DEC128_MANT_DIG__ 34 2025-05-07T19:46:31.5103239Z #define __cpp_lib_tuples_by_type 201304 2025-05-07T19:46:31.5103341Z #define __LDBL_MIN_10_EXP__ (-4931) 2025-05-07T19:46:31.5103452Z #define __cpp_generic_lambdas 201304L 2025-05-07T19:46:31.5103604Z #define _GLIBCXX_THROW_OR_ABORT(_EXC) (throw (_EXC)) 2025-05-07T19:46:31.5103701Z #define __useconds_t_defined 2025-05-07T19:46:31.5103803Z #define _GLIBCXX_USE_SCHED_YIELD 1 2025-05-07T19:46:31.5103997Z #define __attribute_deprecated__ __attribute__ ((__deprecated__)) 2025-05-07T19:46:31.5104163Z #define __cpp_lib_type_trait_variable_templates 201510L 2025-05-07T19:46:31.5104249Z #define __SSE_MATH__ 1 2025-05-07T19:46:31.5104341Z #define _IO_wint_t wint_t 2025-05-07T19:46:31.5104448Z #define __SIZEOF_LONG_LONG__ 8 2025-05-07T19:46:31.5104543Z #define _GLIBCXX_VERBOSE 1 2025-05-07T19:46:31.5104640Z #define _GLIBCXX_HAVE_ASINF 1 2025-05-07T19:46:31.5104756Z #define __cpp_user_defined_literals 200809L 2025-05-07T19:46:31.5104863Z #define _GLIBCXX_HAVE_ISINFL 1 2025-05-07T19:46:31.5104961Z #define _GLIBCXX_HAVE_ASINL 1 2025-05-07T19:46:31.5105167Z #define __USE_ATFILE 1 2025-05-07T19:46:31.5105276Z #define _POSIX_OPEN_MAX 20 2025-05-07T19:46:31.5105474Z #define _POSIX_LOGIN_NAME_MAX 9 2025-05-07T19:46:31.5105567Z #define _GCC_PTRDIFF_T 2025-05-07T19:46:31.5105806Z #define cudaKernelNodeAttributePriority cudaLaunchAttributePriority 2025-05-07T19:46:31.5105921Z #define __FLT128_DECIMAL_DIG__ 36 2025-05-07T19:46:31.5106022Z #define _POSIX_THREAD_KEYS_MAX 128 2025-05-07T19:46:31.5106128Z #define __GCC_ATOMIC_LLONG_LOCK_FREE 2 2025-05-07T19:46:31.5106253Z #define __cpp_lib_array_constexpr 201803L 2025-05-07T19:46:31.5106339Z #define _STDLIB_H 1 2025-05-07T19:46:31.5106486Z #define __exctype(name) extern int name (int) __THROW 2025-05-07T19:46:31.5106585Z #define __FLT32_HAS_QUIET_NAN__ 1 2025-05-07T19:46:31.5106697Z #define __FLT_DECIMAL_DIG__ 9 2025-05-07T19:46:31.5106829Z #define __UINT_FAST16_MAX__ 0xffffffffffffffffUL 2025-05-07T19:46:31.5106943Z #define __SURFACE_INDIRECT_FUNCTIONS_H__ 2025-05-07T19:46:31.5107051Z #define __SM_61_INTRINSICS_H__ 2025-05-07T19:46:31.5107245Z #define _GLIBCXX_PACKAGE_STRING "package-unused version-unused" 2025-05-07T19:46:31.5107411Z #define __isxdigit_l(c,l) __isctype_l((c), _ISxdigit, (l)) 2025-05-07T19:46:31.5107535Z #define __glibcxx_requires_nonempty() 2025-05-07T19:46:31.5107653Z #define w_stopsig __wait_stopped.__w_stopsig 2025-05-07T19:46:31.5107744Z #define __ldiv_t_defined 1 2025-05-07T19:46:31.5107936Z #define __glibcxx_requires_irreflexive_pred(_First,_Last,_Pred) 2025-05-07T19:46:31.5108043Z #define ___int_ptrdiff_t_h 2025-05-07T19:46:31.5108217Z #define __LDBL_NORM_MAX__ 1.18973149535723176502126385303097021e+4932L 2025-05-07T19:46:31.5108321Z #define __cudaCDP2EventDestroy 2025-05-07T19:46:31.5108427Z #define __HOST_DEFINES_H__ 2025-05-07T19:46:31.5108530Z #define __GCC_ATOMIC_SHORT_LOCK_FREE 2 2025-05-07T19:46:31.5108637Z #define __SM_20_ATOMIC_FUNCTIONS_H__ 2025-05-07T19:46:31.5108738Z #define _GLIBCXX_USE_NANOSLEEP 1 2025-05-07T19:46:31.5108835Z #define CUDART_CB 2025-05-07T19:46:31.5108943Z #define BC_BASE_MAX _POSIX2_BC_BASE_MAX 2025-05-07T19:46:31.5109069Z #define _GLIBCXX_USE_C99_INTTYPES_WCHAR_T_TR1 1 2025-05-07T19:46:31.5109174Z #define MB_LEN_MAX 16 2025-05-07T19:46:31.5109411Z #define __glibcxx_requires_partitioned_lower_pred(_First,_Last,_Value,_Pred) 2025-05-07T19:46:31.5109511Z #define _GLIBCXX11_USE_C99_WCHAR 1 2025-05-07T19:46:31.5109640Z #define _IO_peekc(_fp) _IO_peekc_unlocked (_fp) 2025-05-07T19:46:31.5109768Z #define _GLIBCXX_HAVE_AS_SYMVER_DIRECTIVE 1 2025-05-07T19:46:31.5109866Z #define _GLIBCXX_HAVE_UNISTD_H 1 2025-05-07T19:46:31.5110021Z #define __glibc_likely(cond) __builtin_expect((cond), 1) 2025-05-07T19:46:31.5110146Z #define __UINT_FAST8_TYPE__ unsigned char 2025-05-07T19:46:31.5110232Z #define _GNU_SOURCE 1 2025-05-07T19:46:31.5110319Z #define __stub_putmsg 2025-05-07T19:46:31.5110404Z #define __CUDACC__ 1 2025-05-07T19:46:31.5110506Z #define __N(msgid) (msgid) 2025-05-07T19:46:31.5110595Z #define __P(args) args 2025-05-07T19:46:31.5110871Z #define cudaKernelNodeAttributeCooperative cudaLaunchAttributeCooperative 2025-05-07T19:46:31.5110992Z #define __cpp_init_captures 201304L 2025-05-07T19:46:31.5111103Z #define _GLIBCXX17_CONSTEXPR constexpr 2025-05-07T19:46:31.5111194Z #define __ATOMIC_ACQ_REL 4 2025-05-07T19:46:31.5111309Z #define __cpp_lib_as_const 201510 2025-05-07T19:46:31.5111391Z #define __WCHAR_T 2025-05-07T19:46:31.5111479Z #define __ATOMIC_RELEASE 3 2025-05-07T19:46:31.5111575Z #define __fsblkcnt_t_defined 2025-05-07T19:46:31.5111704Z #define __cudaCDP2EventCreateWithFlags 2025-05-07T19:46:31.5111811Z #define __DEVICE_DOUBLE_FUNCTIONS_H__ 2025-05-07T19:46:31.5111818Z 2025-05-07T19:46:31.5224575Z 2025-05-07T19:46:31.5224941Z + conda run -n build_binary nvcc --version 2025-05-07T19:46:31.5224951Z 2025-05-07T19:46:33.3360179Z nvcc: NVIDIA (R) Cuda compiler driver 2025-05-07T19:46:33.3360603Z Copyright (c) 2005-2025 NVIDIA Corporation 2025-05-07T19:46:33.3360938Z Built on Wed_Jan_15_19:20:09_PST_2025 2025-05-07T19:46:33.3361283Z Cuda compilation tools, release 12.8, V12.8.61 2025-05-07T19:46:33.3361955Z Build cuda_12.8.r12.8/compiler.35404655_0 2025-05-07T19:46:33.3362290Z 2025-05-07T19:46:33.3927142Z 2025-05-07T19:46:33.3942213Z which: no nvidia-smi in (CONDA=/github/home/miniconda:/github/home/miniconda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin) 2025-05-07T19:46:33.3944396Z [CHECK] nvidia-smi not found 2025-05-07T19:46:33.3945313Z [INSTALL] Successfully installed CUDA 12.8.0 2025-05-07T19:46:33.4045853Z ##[group]Run . $PRELUDE; install_pytorch_pip $BUILD_ENV nightly cuda/12.8.0 2025-05-07T19:46:33.4046510Z . $PRELUDE; install_pytorch_pip $BUILD_ENV nightly cuda/12.8.0 2025-05-07T19:46:33.4047165Z shell: bash --noprofile --norc -e -o pipefail {0} 2025-05-07T19:46:33.4047534Z env: 2025-05-07T19:46:33.4047805Z PRELUDE: .github/scripts/setup_env.bash 2025-05-07T19:46:33.4048126Z BUILD_ENV: build_binary 2025-05-07T19:46:33.4048420Z BUILD_TARGET: genai 2025-05-07T19:46:33.4048665Z BUILD_VARIANT: cuda 2025-05-07T19:46:33.4048978Z BUILD_CUDA_VERSION: 12.8.0 2025-05-07T19:46:33.4049243Z ##[endgroup] 2025-05-07T19:46:33.8477914Z ################################################################################ 2025-05-07T19:46:33.8478408Z # Install PyTorch (PIP) 2025-05-07T19:46:33.8478659Z # 2025-05-07T19:46:33.8492244Z # [2025-05-07T19:46:33.848Z] + install_pytorch_pip build_binary nightly cuda/12.8.0 2025-05-07T19:46:33.8492828Z ################################################################################ 2025-05-07T19:46:33.8493146Z 2025-05-07T19:46:33.8531385Z [EXEC] [ATTEMPT 0/3] + conda install -n build_binary -c conda-forge --override-channels -y numpy 2025-05-07T19:46:34.7879931Z Channels: 2025-05-07T19:46:34.7880624Z - conda-forge 2025-05-07T19:46:34.7881294Z Platform: linux-64 2025-05-07T19:46:37.8868261Z Collecting package metadata (repodata.json): - \ | / done 2025-05-07T19:46:39.5146444Z Solving environment: \ | / - done 2025-05-07T19:46:39.8140360Z 2025-05-07T19:46:39.8140818Z ## Package Plan ## 2025-05-07T19:46:39.8141019Z 2025-05-07T19:46:39.8141687Z environment location: /github/home/miniconda/envs/build_binary 2025-05-07T19:46:39.8142045Z 2025-05-07T19:46:39.8142152Z added / updated specs: 2025-05-07T19:46:39.8142436Z - numpy 2025-05-07T19:46:39.8142564Z 2025-05-07T19:46:39.8142568Z 2025-05-07T19:46:39.8142721Z The following packages will be downloaded: 2025-05-07T19:46:39.8142959Z 2025-05-07T19:46:39.8143111Z package | build 2025-05-07T19:46:39.8143481Z ---------------------------|----------------- 2025-05-07T19:46:39.8144012Z libblas-3.9.0 |31_h59b9bed_openblas 16 KB conda-forge 2025-05-07T19:46:39.8144509Z libcblas-3.9.0 |31_he106b2a_openblas 16 KB conda-forge 2025-05-07T19:46:39.8145140Z liblapack-3.9.0 |31_h7ac8fdf_openblas 16 KB conda-forge 2025-05-07T19:46:39.8145582Z numpy-2.2.5 | py313h17eae1a_0 8.1 MB conda-forge 2025-05-07T19:46:39.8146003Z ------------------------------------------------------------ 2025-05-07T19:46:39.8146373Z Total: 8.2 MB 2025-05-07T19:46:39.8146794Z 2025-05-07T19:46:39.8147003Z The following NEW packages will be INSTALLED: 2025-05-07T19:46:39.8147243Z 2025-05-07T19:46:39.8147510Z libblas conda-forge/linux-64::libblas-3.9.0-31_h59b9bed_openblas 2025-05-07T19:46:39.8148063Z libcblas conda-forge/linux-64::libcblas-3.9.0-31_he106b2a_openblas 2025-05-07T19:46:39.8148655Z liblapack conda-forge/linux-64::liblapack-3.9.0-31_h7ac8fdf_openblas 2025-05-07T19:46:39.8149177Z numpy conda-forge/linux-64::numpy-2.2.5-py313h17eae1a_0 2025-05-07T19:46:39.8149491Z 2025-05-07T19:46:39.8149495Z 2025-05-07T19:46:39.8149499Z 2025-05-07T19:46:39.8149653Z Downloading and Extracting Packages: ...working... 2025-05-07T19:46:39.8150074Z numpy-2.2.5 | 8.1 MB | | 0% 2025-05-07T19:46:39.8151320Z 2025-05-07T19:46:39.8151638Z libblas-3.9.0 | 16 KB | | 0%  2025-05-07T19:46:39.8151922Z 2025-05-07T19:46:39.8151938Z 2025-05-07T19:46:39.8154343Z libcblas-3.9.0 | 16 KB | | 0%  2025-05-07T19:46:39.8154777Z 2025-05-07T19:46:39.8154882Z 2025-05-07T19:46:39.8155082Z 2025-05-07T19:46:39.9884052Z liblapack-3.9.0 | 16 KB | | 0%  2025-05-07T19:46:39.9885425Z 2025-05-07T19:46:39.9885780Z 2025-05-07T19:46:39.9885787Z 2025-05-07T19:46:39.9985903Z liblapack-3.9.0 | 16 KB | #########7 | 98%  2025-05-07T19:46:39.9987178Z 2025-05-07T19:46:39.9987192Z 2025-05-07T19:46:39.9987221Z 2025-05-07T19:46:40.0673022Z liblapack-3.9.0 | 16 KB | ########## | 100%  2025-05-07T19:46:40.0673412Z 2025-05-07T19:46:40.0673417Z 2025-05-07T19:46:40.0673421Z 2025-05-07T19:46:40.1043009Z liblapack-3.9.0 | 16 KB | ########## | 100%  2025-05-07T19:46:40.1043403Z 2025-05-07T19:46:40.1043885Z libblas-3.9.0 | 16 KB | #########7 | 97%  2025-05-07T19:46:40.1044778Z 2025-05-07T19:46:40.1100010Z libblas-3.9.0 | 16 KB | ########## | 100%  2025-05-07T19:46:40.1309216Z numpy-2.2.5 | 8.1 MB | | 0% 2025-05-07T19:46:40.1309715Z 2025-05-07T19:46:40.1632857Z libblas-3.9.0 | 16 KB | ########## | 100%  2025-05-07T19:46:40.1864839Z numpy-2.2.5 | 8.1 MB | ########## | 100% 2025-05-07T19:46:40.1865957Z 2025-05-07T19:46:40.1865965Z 2025-05-07T19:46:40.1870997Z libcblas-3.9.0 | 16 KB | #########7 | 98%  2025-05-07T19:46:40.1872381Z 2025-05-07T19:46:40.1872394Z 2025-05-07T19:46:40.2212642Z libcblas-3.9.0 | 16 KB | ########## | 100%  2025-05-07T19:46:40.2213266Z 2025-05-07T19:46:40.2213303Z 2025-05-07T19:46:40.5062299Z libcblas-3.9.0 | 16 KB | ########## | 100%  2025-05-07T19:46:40.5063942Z numpy-2.2.5 | 8.1 MB | ########## | 100% 2025-05-07T19:46:40.5068322Z numpy-2.2.5 | 8.1 MB | ########## | 100% 2025-05-07T19:46:40.5069182Z 2025-05-07T19:46:40.5069446Z 2025-05-07T19:46:40.5069690Z  2025-05-07T19:46:40.5069918Z 2025-05-07T19:46:40.5069922Z 2025-05-07T19:46:40.5070109Z  2025-05-07T19:46:40.5070376Z 2025-05-07T19:46:40.5070380Z 2025-05-07T19:46:40.5070394Z 2025-05-07T19:46:40.5070609Z  done 2025-05-07T19:46:40.6080090Z Preparing transaction: | done 2025-05-07T19:46:40.7090229Z Verifying transaction: - done 2025-05-07T19:46:40.8101986Z Executing transaction: | done 2025-05-07T19:46:40.9144081Z ################################################################################ 2025-05-07T19:46:40.9144948Z # Install Package From PyTorch PIP: torch 2025-05-07T19:46:40.9145314Z # 2025-05-07T19:46:40.9164559Z # [2025-05-07T19:46:40.915Z] + install_from_pytorch_pip build_binary torch nightly cuda/12.8.0 2025-05-07T19:46:40.9166211Z ################################################################################ 2025-05-07T19:46:40.9166909Z 2025-05-07T19:46:40.9180661Z [EXEC] [ATTEMPT 0/3] + wget -q --timeout 1 pypi.org -O /dev/null 2025-05-07T19:46:41.0061390Z [CHECK] Network does not appear to be blocked. 2025-05-07T19:46:41.0061894Z ################################################################################ 2025-05-07T19:46:41.0062321Z # Prepare PIP Arguments (PyTorch PIP) 2025-05-07T19:46:41.0062653Z # 2025-05-07T19:46:41.0085179Z # [2025-05-07T19:46:41.007Z] + __prepare_pip_arguments torch nightly cuda/12.8.0 2025-05-07T19:46:41.0085785Z ################################################################################ 2025-05-07T19:46:41.0086036Z 2025-05-07T19:46:41.0118642Z [INSTALL] Extracted package (channel, version): (nightly, LATEST) 2025-05-07T19:46:41.0142534Z [INSTALL] Extracted package variant: cu128 2025-05-07T19:46:41.0159077Z [INSTALL] Using a non-RELEASE channel: nightly ... 2025-05-07T19:46:41.0159705Z [INSTALL] Extracted the full PIP channel: https://download.pytorch.org/whl/nightly/cu128/ 2025-05-07T19:46:41.0164276Z [INSTALL] Extracted the full PIP package: --pre torch 2025-05-07T19:46:41.0172275Z [INSTALL] Attempting to install [torch, LATEST] from PyTorch PIP using channel https://download.pytorch.org/whl/nightly/cu128/ ... 2025-05-07T19:46:41.0198401Z [EXEC] [ATTEMPT 0/3] + conda run -n build_binary pip install --pre torch --index-url https://download.pytorch.org/whl/nightly/cu128/ 2025-05-07T19:48:31.4323987Z WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager, possibly rendering your system unusable. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv. Use the --root-user-action option if you know what you are doing and want to suppress this warning. 2025-05-07T19:48:31.4325593Z 2025-05-07T19:48:31.4325850Z Looking in indexes: https://download.pytorch.org/whl/nightly/cu128/ 2025-05-07T19:48:31.4326309Z Collecting torch 2025-05-07T19:48:31.4327046Z Downloading https://download.pytorch.org/whl/nightly/cu128/torch-2.8.0.dev20250507%2Bcu128-cp313-cp313-manylinux_2_28_x86_64.whl.metadata (30 kB) 2025-05-07T19:48:31.4327858Z Collecting filelock (from torch) 2025-05-07T19:48:31.4328417Z Downloading https://download.pytorch.org/whl/nightly/filelock-3.16.1-py3-none-any.whl (16 kB) 2025-05-07T19:48:31.4329464Z Requirement already satisfied: typing-extensions>=4.10.0 in /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages (from torch) (4.13.2) 2025-05-07T19:48:31.4330661Z Requirement already satisfied: setuptools in /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages (from torch) (78.1.1) 2025-05-07T19:48:31.4331414Z Collecting sympy>=1.13.3 (from torch) 2025-05-07T19:48:31.4331971Z Downloading https://download.pytorch.org/whl/nightly/sympy-1.13.3-py3-none-any.whl (6.2 MB) 2025-05-07T19:48:31.4332947Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 6.2/6.2 MB 53.9 MB/s eta 0:00:00 2025-05-07T19:48:31.4333336Z Collecting networkx (from torch) 2025-05-07T19:48:31.4333926Z Downloading https://download.pytorch.org/whl/nightly/networkx-3.4.2-py3-none-any.whl (1.7 MB) 2025-05-07T19:48:31.4334637Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.7/1.7 MB 13.2 MB/s eta 0:00:00 2025-05-07T19:48:31.4335420Z Requirement already satisfied: jinja2 in /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages (from torch) (3.1.6) 2025-05-07T19:48:31.4336134Z Collecting fsspec (from torch) 2025-05-07T19:48:31.4336689Z Downloading https://download.pytorch.org/whl/nightly/fsspec-2024.10.0-py3-none-any.whl (179 kB) 2025-05-07T19:48:31.4337317Z Collecting nvidia-cuda-nvrtc-cu12==12.8.61 (from torch) 2025-05-07T19:48:31.4338244Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_cuda_nvrtc_cu12-12.8.61-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl.metadata (1.7 kB) 2025-05-07T19:48:31.4339198Z Collecting nvidia-cuda-runtime-cu12==12.8.57 (from torch) 2025-05-07T19:48:31.4340223Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_cuda_runtime_cu12-12.8.57-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (1.7 kB) 2025-05-07T19:48:31.4341162Z Collecting nvidia-cuda-cupti-cu12==12.8.57 (from torch) 2025-05-07T19:48:31.4342075Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_cuda_cupti_cu12-12.8.57-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (1.7 kB) 2025-05-07T19:48:31.4342983Z Collecting nvidia-cudnn-cu12==9.8.0.87 (from torch) 2025-05-07T19:48:31.4343773Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_cudnn_cu12-9.8.0.87-py3-none-manylinux_2_27_x86_64.whl.metadata (1.8 kB) 2025-05-07T19:48:31.4344558Z Collecting nvidia-cublas-cu12==12.8.3.14 (from torch) 2025-05-07T19:48:31.4345356Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_cublas_cu12-12.8.3.14-py3-none-manylinux_2_27_x86_64.whl.metadata (1.7 kB) 2025-05-07T19:48:31.4346542Z Collecting nvidia-cufft-cu12==11.3.3.41 (from torch) 2025-05-07T19:48:31.4347428Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_cufft_cu12-11.3.3.41-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (1.5 kB) 2025-05-07T19:48:31.4348316Z Collecting nvidia-curand-cu12==10.3.9.55 (from torch) 2025-05-07T19:48:31.4349239Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_curand_cu12-10.3.9.55-py3-none-manylinux_2_27_x86_64.whl.metadata (1.5 kB) 2025-05-07T19:48:31.4350052Z Collecting nvidia-cusolver-cu12==11.7.2.55 (from torch) 2025-05-07T19:48:31.4350855Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_cusolver_cu12-11.7.2.55-py3-none-manylinux_2_27_x86_64.whl.metadata (1.6 kB) 2025-05-07T19:48:31.4351773Z Collecting nvidia-cusparse-cu12==12.5.7.53 (from torch) 2025-05-07T19:48:31.4352618Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_cusparse_cu12-12.5.7.53-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (1.6 kB) 2025-05-07T19:48:31.4353452Z Collecting nvidia-cusparselt-cu12==0.6.3 (from torch) 2025-05-07T19:48:31.4354210Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_cusparselt_cu12-0.6.3-py3-none-manylinux2014_x86_64.whl.metadata (6.8 kB) 2025-05-07T19:48:31.4354932Z Collecting nvidia-nccl-cu12==2.26.2 (from torch) 2025-05-07T19:48:31.4355734Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_nccl_cu12-2.26.2-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (2.0 kB) 2025-05-07T19:48:31.4356544Z Collecting nvidia-nvtx-cu12==12.8.55 (from torch) 2025-05-07T19:48:31.4357329Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_nvtx_cu12-12.8.55-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (1.6 kB) 2025-05-07T19:48:31.4358158Z Collecting nvidia-nvjitlink-cu12==12.8.61 (from torch) 2025-05-07T19:48:31.4358991Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_nvjitlink_cu12-12.8.61-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl.metadata (1.7 kB) 2025-05-07T19:48:31.4359829Z Collecting nvidia-cufile-cu12==1.13.0.11 (from torch) 2025-05-07T19:48:31.4360659Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_cufile_cu12-1.13.0.11-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (1.5 kB) 2025-05-07T19:48:31.4361492Z Collecting pytorch-triton==3.3.0+git96316ce5 (from torch) 2025-05-07T19:48:31.4362356Z Downloading https://download.pytorch.org/whl/nightly/pytorch_triton-3.3.0%2Bgit96316ce5-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (1.6 kB) 2025-05-07T19:48:31.4363200Z Collecting mpmath<1.4,>=1.1.0 (from sympy>=1.13.3->torch) 2025-05-07T19:48:31.4363772Z Downloading https://download.pytorch.org/whl/nightly/mpmath-1.3.0-py3-none-any.whl (536 kB) 2025-05-07T19:48:31.4364461Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 536.2/536.2 kB 2.3 MB/s eta 0:00:00 2025-05-07T19:48:31.4365233Z Requirement already satisfied: MarkupSafe>=2.0 in /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages (from jinja2->torch) (3.0.2) 2025-05-07T19:48:31.4366362Z Downloading https://download.pytorch.org/whl/nightly/cu128/torch-2.8.0.dev20250507%2Bcu128-cp313-cp313-manylinux_2_28_x86_64.whl (1047.0 MB) 2025-05-07T19:48:31.4367198Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.0/1.0 GB 25.0 MB/s eta 0:00:00 2025-05-07T19:48:31.4367920Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_cublas_cu12-12.8.3.14-py3-none-manylinux_2_27_x86_64.whl (609.6 MB) 2025-05-07T19:48:31.4368736Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 609.6/609.6 MB 40.2 MB/s eta 0:00:00 2025-05-07T19:48:31.4369531Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_cuda_cupti_cu12-12.8.57-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (10.2 MB) 2025-05-07T19:48:31.4370425Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 10.2/10.2 MB 28.0 MB/s eta 0:00:00 2025-05-07T19:48:31.4371340Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_cuda_nvrtc_cu12-12.8.61-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl (88.0 MB) 2025-05-07T19:48:31.4372238Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 88.0/88.0 MB 84.9 MB/s eta 0:00:00 2025-05-07T19:48:31.4373116Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_cuda_runtime_cu12-12.8.57-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (954 kB) 2025-05-07T19:48:31.4374009Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 954.8/954.8 kB 30.6 MB/s eta 0:00:00 2025-05-07T19:48:31.4374714Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_cudnn_cu12-9.8.0.87-py3-none-manylinux_2_27_x86_64.whl (698.0 MB) 2025-05-07T19:48:31.4375492Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 698.0/698.0 MB 39.5 MB/s eta 0:00:00 2025-05-07T19:48:31.4376282Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_cufft_cu12-11.3.3.41-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (193.1 MB) 2025-05-07T19:48:31.4377176Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 193.1/193.1 MB 81.1 MB/s eta 0:00:00 2025-05-07T19:48:31.4377950Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_cufile_cu12-1.13.0.11-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (1.2 MB) 2025-05-07T19:48:31.4378829Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.2/1.2 MB 9.8 MB/s eta 0:00:00 2025-05-07T19:48:31.4379619Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_curand_cu12-10.3.9.55-py3-none-manylinux_2_27_x86_64.whl (63.6 MB) 2025-05-07T19:48:31.4380631Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 63.6/63.6 MB 68.6 MB/s eta 0:00:00 2025-05-07T19:48:31.4381456Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_cusolver_cu12-11.7.2.55-py3-none-manylinux_2_27_x86_64.whl (260.4 MB) 2025-05-07T19:48:31.4382323Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 260.4/260.4 MB 86.5 MB/s eta 0:00:00 2025-05-07T19:48:31.4383193Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_cusparse_cu12-12.5.7.53-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (292.1 MB) 2025-05-07T19:48:31.4384171Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 292.1/292.1 MB 75.1 MB/s eta 0:00:00 2025-05-07T19:48:31.4384926Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_cusparselt_cu12-0.6.3-py3-none-manylinux2014_x86_64.whl (156.8 MB) 2025-05-07T19:48:31.4385810Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 156.8/156.8 MB 70.3 MB/s eta 0:00:00 2025-05-07T19:48:31.4386633Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_nccl_cu12-2.26.2-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (201.3 MB) 2025-05-07T19:48:31.4387570Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 201.3/201.3 MB 83.1 MB/s eta 0:00:00 2025-05-07T19:48:31.4388434Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_nvjitlink_cu12-12.8.61-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl (39.2 MB) 2025-05-07T19:48:31.4389455Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 39.2/39.2 MB 67.7 MB/s eta 0:00:00 2025-05-07T19:48:31.4390293Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_nvtx_cu12-12.8.55-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (89 kB) 2025-05-07T19:48:31.4391581Z Downloading https://download.pytorch.org/whl/nightly/pytorch_triton-3.3.0%2Bgit96316ce5-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (153.5 MB) 2025-05-07T19:48:31.4392782Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 153.5/153.5 MB 72.8 MB/s eta 0:00:00 2025-05-07T19:48:31.4394535Z Installing collected packages: nvidia-cusparselt-cu12, mpmath, sympy, pytorch-triton, nvidia-nvtx-cu12, nvidia-nvjitlink-cu12, nvidia-nccl-cu12, nvidia-curand-cu12, nvidia-cufile-cu12, nvidia-cuda-runtime-cu12, nvidia-cuda-nvrtc-cu12, nvidia-cuda-cupti-cu12, nvidia-cublas-cu12, networkx, fsspec, filelock, nvidia-cusparse-cu12, nvidia-cufft-cu12, nvidia-cudnn-cu12, nvidia-cusolver-cu12, torch 2025-05-07T19:48:31.4396106Z 2025-05-07T19:48:31.4398103Z Successfully installed filelock-3.16.1 fsspec-2024.10.0 mpmath-1.3.0 networkx-3.4.2 nvidia-cublas-cu12-12.8.3.14 nvidia-cuda-cupti-cu12-12.8.57 nvidia-cuda-nvrtc-cu12-12.8.61 nvidia-cuda-runtime-cu12-12.8.57 nvidia-cudnn-cu12-9.8.0.87 nvidia-cufft-cu12-11.3.3.41 nvidia-cufile-cu12-1.13.0.11 nvidia-curand-cu12-10.3.9.55 nvidia-cusolver-cu12-11.7.2.55 nvidia-cusparse-cu12-12.5.7.53 nvidia-cusparselt-cu12-0.6.3 nvidia-nccl-cu12-2.26.2 nvidia-nvjitlink-cu12-12.8.61 nvidia-nvtx-cu12-12.8.55 pytorch-triton-3.3.0+git96316ce5 sympy-1.13.3 torch-2.8.0.dev20250507+cu128 2025-05-07T19:48:31.4400142Z 2025-05-07T19:48:33.5976151Z torch 2.8.0.dev20250507+cu128 2025-05-07T19:48:33.5976737Z [CHECK] The installed package [torch, nightly/LATEST] is the correct variant (cu128) 2025-05-07T19:48:36.9536511Z [CHECK] Python (sub-)package 'torch.distributed' found ... 2025-05-07T19:48:40.3220693Z [CHECK] NOTE: The installed version is: 2.8.0.dev20250507+cu128 2025-05-07T19:48:40.3222068Z [CHECK] NOTE: Checking _GLIBCXX_USE_CXX11_ABI ... 2025-05-07T19:48:43.6005743Z True 2025-05-07T19:48:43.6006083Z True 2025-05-07T19:48:43.6006202Z 2025-05-07T19:48:43.6586097Z [INSTALL] Successfully installed PyTorch through PyTorch PIP 2025-05-07T19:48:43.6669965Z ##[group]Run if . $PRELUDE && which conda; then collect_pytorch_env_info $BUILD_ENV; fi 2025-05-07T19:48:43.6670624Z if . $PRELUDE && which conda; then collect_pytorch_env_info $BUILD_ENV; fi 2025-05-07T19:48:43.6671251Z shell: bash --noprofile --norc -e -o pipefail {0} 2025-05-07T19:48:43.6671584Z env: 2025-05-07T19:48:43.6671801Z PRELUDE: .github/scripts/setup_env.bash 2025-05-07T19:48:43.6672122Z BUILD_ENV: build_binary 2025-05-07T19:48:43.6672387Z BUILD_TARGET: genai 2025-05-07T19:48:43.6672616Z BUILD_VARIANT: cuda 2025-05-07T19:48:43.6672878Z BUILD_CUDA_VERSION: 12.8.0 2025-05-07T19:48:43.6673127Z ##[endgroup] 2025-05-07T19:48:44.1391901Z /github/home/miniconda/bin/conda 2025-05-07T19:48:44.1392919Z ################################################################################ 2025-05-07T19:48:44.1394235Z # Collect PyTorch Environment Information (for Reporting Issues) 2025-05-07T19:48:44.1395396Z # 2025-05-07T19:48:44.1408823Z # [2025-05-07T19:48:44.140Z] + collect_pytorch_env_info build_binary 2025-05-07T19:48:44.1410124Z ################################################################################ 2025-05-07T19:48:44.1410840Z 2025-05-07T19:48:44.1423096Z [EXEC] [ATTEMPT 0/3] + wget -q --timeout 1 pypi.org -O /dev/null 2025-05-07T19:48:44.2335619Z [CHECK] Network does not appear to be blocked. 2025-05-07T19:48:44.2341122Z [INFO] Downloading the PyTorch environment info collection script ... 2025-05-07T19:48:44.2342870Z + wget -q https://raw.githubusercontent.com/pytorch/pytorch/main/torch/utils/collect_env.py 2025-05-07T19:48:44.2343314Z 2025-05-07T19:48:44.3195035Z 2025-05-07T19:48:44.3195825Z [INFO] Collecting PyTorch environment info (will be needed for reporting issues to PyTorch) ... 2025-05-07T19:48:44.3217450Z [EXEC] [ATTEMPT 0/3] + conda run -n build_binary python collect_env.py 2025-05-07T19:48:49.8707273Z Collecting environment information... 2025-05-07T19:48:49.8707829Z PyTorch version: 2.8.0.dev20250507+cu128 2025-05-07T19:48:49.8708225Z Is debug build: False 2025-05-07T19:48:49.8708513Z CUDA used to build PyTorch: 12.8 2025-05-07T19:48:49.8708835Z ROCM used to build PyTorch: N/A 2025-05-07T19:48:49.8709031Z 2025-05-07T19:48:49.8709147Z OS: Amazon Linux 2023.7.20250428 (x86_64) 2025-05-07T19:48:49.8709514Z GCC version: (conda-forge gcc 11.4.0-13) 11.4.0 2025-05-07T19:48:49.8709863Z Clang version: Could not collect 2025-05-07T19:48:49.8710187Z CMake version: version 4.0.2 2025-05-07T19:48:49.8710481Z Libc version: glibc-2.34 2025-05-07T19:48:49.8710668Z 2025-05-07T19:48:49.8711010Z Python version: 3.13.2 | packaged by conda-forge | (main, Feb 17 2025, 14:10:22) [GCC 13.3.0] (64-bit runtime) 2025-05-07T19:48:49.8711712Z Python platform: Linux-6.1.130-139.222.amzn2023.x86_64-x86_64-with-glibc2.34 2025-05-07T19:48:49.8712306Z Is CUDA available: False 2025-05-07T19:48:49.8712592Z CUDA runtime version: 12.8.61 2025-05-07T19:48:49.8712873Z CUDA_MODULE_LOADING set to: N/A 2025-05-07T19:48:49.8713210Z GPU models and configuration: Could not collect 2025-05-07T19:48:49.8713568Z Nvidia driver version: Could not collect 2025-05-07T19:48:49.8713904Z cuDNN version: Could not collect 2025-05-07T19:48:49.8714190Z HIP runtime version: N/A 2025-05-07T19:48:49.8714473Z MIOpen runtime version: N/A 2025-05-07T19:48:49.8714765Z Is XNNPACK available: True 2025-05-07T19:48:49.8714935Z 2025-05-07T19:48:49.8715015Z CPU: 2025-05-07T19:48:49.8715252Z Architecture: x86_64 2025-05-07T19:48:49.8715601Z CPU op-mode(s): 32-bit, 64-bit 2025-05-07T19:48:49.8716028Z Address sizes: 46 bits physical, 48 bits virtual 2025-05-07T19:48:49.8716437Z Byte Order: Little Endian 2025-05-07T19:48:49.8716794Z CPU(s): 96 2025-05-07T19:48:49.8717113Z On-line CPU(s) list: 0-95 2025-05-07T19:48:49.8717462Z Vendor ID: GenuineIntel 2025-05-07T19:48:49.8718299Z Model name: Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:48:49.8718830Z CPU family: 6 2025-05-07T19:48:49.8719137Z Model: 85 2025-05-07T19:48:49.8719429Z Thread(s) per core: 2 2025-05-07T19:48:49.8719737Z Core(s) per socket: 24 2025-05-07T19:48:49.8720270Z Socket(s): 2 2025-05-07T19:48:49.8720553Z Stepping: 7 2025-05-07T19:48:49.8720870Z BogoMIPS: 5999.99 2025-05-07T19:48:49.8723165Z Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:48:49.8725514Z Hypervisor vendor: KVM 2025-05-07T19:48:49.8725829Z Virtualization type: full 2025-05-07T19:48:49.8726189Z L1d cache: 1.5 MiB (48 instances) 2025-05-07T19:48:49.8726562Z L1i cache: 1.5 MiB (48 instances) 2025-05-07T19:48:49.8726939Z L2 cache: 48 MiB (48 instances) 2025-05-07T19:48:49.8727304Z L3 cache: 71.5 MiB (2 instances) 2025-05-07T19:48:49.8727650Z NUMA node(s): 2 2025-05-07T19:48:49.8727957Z NUMA node0 CPU(s): 0-23,48-71 2025-05-07T19:48:49.8728452Z NUMA node1 CPU(s): 24-47,72-95 2025-05-07T19:48:49.8728926Z Vulnerability Gather data sampling: Unknown: Dependent on hypervisor status 2025-05-07T19:48:49.8729481Z Vulnerability Itlb multihit: KVM: Mitigation: VMX unsupported 2025-05-07T19:48:49.8729987Z Vulnerability L1tf: Mitigation; PTE Inversion 2025-05-07T19:48:49.8730571Z Vulnerability Mds: Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown 2025-05-07T19:48:49.8731161Z Vulnerability Meltdown: Mitigation; PTI 2025-05-07T19:48:49.8731770Z Vulnerability Mmio stale data: Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown 2025-05-07T19:48:49.8732368Z Vulnerability Reg file data sampling: Not affected 2025-05-07T19:48:49.8732750Z Vulnerability Retbleed: Vulnerable 2025-05-07T19:48:49.8733112Z Vulnerability Spec rstack overflow: Not affected 2025-05-07T19:48:49.8733503Z Vulnerability Spec store bypass: Vulnerable 2025-05-07T19:48:49.8734048Z Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization 2025-05-07T19:48:49.8734885Z Vulnerability Spectre v2: Mitigation; Retpolines; STIBP disabled; RSB filling; PBRSB-eIBRS Not affected; BHI Retpoline 2025-05-07T19:48:49.8735534Z Vulnerability Srbds: Not affected 2025-05-07T19:48:49.8735896Z Vulnerability Tsx async abort: Not affected 2025-05-07T19:48:49.8736133Z 2025-05-07T19:48:49.8736257Z Versions of relevant libraries: 2025-05-07T19:48:49.8736521Z [pip3] numpy==2.2.5 2025-05-07T19:48:49.8736781Z [pip3] nvidia-cublas-cu12==12.8.3.14 2025-05-07T19:48:49.8737085Z [pip3] nvidia-cuda-cupti-cu12==12.8.57 2025-05-07T19:48:49.8737407Z [pip3] nvidia-cuda-nvrtc-cu12==12.8.61 2025-05-07T19:48:49.8737715Z [pip3] nvidia-cuda-runtime-cu12==12.8.57 2025-05-07T19:48:49.8738042Z [pip3] nvidia-cudnn-cu12==9.8.0.87 2025-05-07T19:48:49.8738344Z [pip3] nvidia-cufft-cu12==11.3.3.41 2025-05-07T19:48:49.8738637Z [pip3] nvidia-curand-cu12==10.3.9.55 2025-05-07T19:48:49.8738950Z [pip3] nvidia-cusolver-cu12==11.7.2.55 2025-05-07T19:48:49.8739249Z [pip3] nvidia-cusparse-cu12==12.5.7.53 2025-05-07T19:48:49.8740003Z [pip3] nvidia-cusparselt-cu12==0.6.3 2025-05-07T19:48:49.8740332Z [pip3] nvidia-nccl-cu12==2.26.2 2025-05-07T19:48:49.8740667Z [pip3] nvidia-nvjitlink-cu12==12.8.61 2025-05-07T19:48:49.8740991Z [pip3] nvidia-nvtx-cu12==12.8.55 2025-05-07T19:48:49.8741327Z [pip3] pytorch-triton==3.3.0+git96316ce5 2025-05-07T19:48:49.8758962Z [pip3] torch==2.8.0.dev20250507+cu128 2025-05-07T19:48:49.8759408Z [conda] cuda-cudart 12.8.57 h5888daf_1 conda-forge 2025-05-07T19:48:49.8759907Z [conda] cuda-cudart-dev 12.8.57 h5888daf_1 conda-forge 2025-05-07T19:48:49.8760436Z [conda] cuda-cudart-dev_linux-64 12.8.57 h3f2d84a_1 conda-forge 2025-05-07T19:48:49.8760988Z [conda] cuda-cudart-static 12.8.57 h5888daf_1 conda-forge 2025-05-07T19:48:49.8761563Z [conda] cuda-cudart-static_linux-64 12.8.57 h3f2d84a_1 conda-forge 2025-05-07T19:48:49.8762128Z [conda] cuda-cudart_linux-64 12.8.57 h3f2d84a_1 conda-forge 2025-05-07T19:48:49.8762616Z [conda] cuda-cupti 12.8.57 hbd13f7d_0 conda-forge 2025-05-07T19:48:49.8763121Z [conda] cuda-cupti-dev 12.8.57 h5888daf_0 conda-forge 2025-05-07T19:48:49.8763628Z [conda] cuda-libraries 12.8.0 ha770c72_0 conda-forge 2025-05-07T19:48:49.8764130Z [conda] cuda-libraries-dev 12.8.0 ha770c72_0 conda-forge 2025-05-07T19:48:49.8764639Z [conda] cuda-nvrtc 12.8.61 hbd13f7d_0 conda-forge 2025-05-07T19:48:49.8765112Z [conda] cuda-nvrtc-dev 12.8.61 h5888daf_0 conda-forge 2025-05-07T19:48:49.8765600Z [conda] cuda-nvtx 12.8.55 hbd13f7d_0 conda-forge 2025-05-07T19:48:49.8766218Z [conda] cuda-opencl 12.8.55 hbd13f7d_0 conda-forge 2025-05-07T19:48:49.8766726Z [conda] cuda-opencl-dev 12.8.55 h5888daf_0 conda-forge 2025-05-07T19:48:49.8767231Z [conda] cuda-runtime 12.8.0 ha804496_0 conda-forge 2025-05-07T19:48:49.8767702Z [conda] libcublas 12.8.3.14 h9ab20c4_0 conda-forge 2025-05-07T19:48:49.8768205Z [conda] libcublas-dev 12.8.3.14 h9ab20c4_0 conda-forge 2025-05-07T19:48:49.8768683Z [conda] libcufft 11.3.3.41 hbd13f7d_0 conda-forge 2025-05-07T19:48:49.8769172Z [conda] libcufft-dev 11.3.3.41 h5888daf_0 conda-forge 2025-05-07T19:48:49.8769646Z [conda] libcurand 10.3.9.55 hbd13f7d_0 conda-forge 2025-05-07T19:48:49.8770143Z [conda] libcurand-dev 10.3.9.55 h5888daf_0 conda-forge 2025-05-07T19:48:49.8770646Z [conda] libcusolver 11.7.2.55 h9ab20c4_0 conda-forge 2025-05-07T19:48:49.8771140Z [conda] libcusolver-dev 11.7.2.55 h9ab20c4_0 conda-forge 2025-05-07T19:48:49.8771653Z [conda] libcusparse 12.5.7.53 hbd13f7d_0 conda-forge 2025-05-07T19:48:49.8772142Z [conda] libcusparse-dev 12.5.7.53 h5888daf_0 conda-forge 2025-05-07T19:48:49.8772651Z [conda] libnvjitlink 12.8.61 hbd13f7d_0 conda-forge 2025-05-07T19:48:49.8773163Z [conda] libnvjitlink-dev 12.8.61 h5888daf_0 conda-forge 2025-05-07T19:48:49.8773631Z [conda] numpy 2.2.5 py313h17eae1a_0 conda-forge 2025-05-07T19:48:49.8774117Z [conda] nvidia-cublas-cu12 12.8.3.14 pypi_0 pypi 2025-05-07T19:48:49.8774616Z [conda] nvidia-cuda-cupti-cu12 12.8.57 pypi_0 pypi 2025-05-07T19:48:49.8775136Z [conda] nvidia-cuda-nvrtc-cu12 12.8.61 pypi_0 pypi 2025-05-07T19:48:49.8775644Z [conda] nvidia-cuda-runtime-cu12 12.8.57 pypi_0 pypi 2025-05-07T19:48:49.8776278Z [conda] nvidia-cudnn-cu12 9.8.0.87 pypi_0 pypi 2025-05-07T19:48:49.8776779Z [conda] nvidia-cufft-cu12 11.3.3.41 pypi_0 pypi 2025-05-07T19:48:49.8777258Z [conda] nvidia-curand-cu12 10.3.9.55 pypi_0 pypi 2025-05-07T19:48:49.8777765Z [conda] nvidia-cusolver-cu12 11.7.2.55 pypi_0 pypi 2025-05-07T19:48:49.8778263Z [conda] nvidia-cusparse-cu12 12.5.7.53 pypi_0 pypi 2025-05-07T19:48:49.8778789Z [conda] nvidia-cusparselt-cu12 0.6.3 pypi_0 pypi 2025-05-07T19:48:49.8779280Z [conda] nvidia-nccl-cu12 2.26.2 pypi_0 pypi 2025-05-07T19:48:49.8780135Z [conda] nvidia-nvjitlink-cu12 12.8.61 pypi_0 pypi 2025-05-07T19:48:49.8780762Z [conda] nvidia-nvtx-cu12 12.8.55 pypi_0 pypi 2025-05-07T19:48:49.8781289Z [conda] pytorch-triton 3.3.0+git96316ce5 pypi_0 pypi 2025-05-07T19:48:49.8781820Z [conda] torch 2.8.0.dev20250507+cu128 pypi_0 pypi 2025-05-07T19:48:49.8782124Z 2025-05-07T19:48:49.9669950Z ##[group]Run . $PRELUDE; install_cudnn $BUILD_ENV "$(pwd)/build_only/cudnn" 12.8.0 2025-05-07T19:48:49.9670664Z . $PRELUDE; install_cudnn $BUILD_ENV "$(pwd)/build_only/cudnn" 12.8.0 2025-05-07T19:48:49.9671358Z shell: bash --noprofile --norc -e -o pipefail {0} 2025-05-07T19:48:49.9671764Z env: 2025-05-07T19:48:49.9672127Z PRELUDE: .github/scripts/setup_env.bash 2025-05-07T19:48:49.9672479Z BUILD_ENV: build_binary 2025-05-07T19:48:49.9672749Z BUILD_TARGET: genai 2025-05-07T19:48:49.9673203Z BUILD_VARIANT: cuda 2025-05-07T19:48:49.9673469Z BUILD_CUDA_VERSION: 12.8.0 2025-05-07T19:48:49.9673781Z ##[endgroup] 2025-05-07T19:48:50.4320710Z ################################################################################ 2025-05-07T19:48:50.4321387Z # Install cuDNN 2025-05-07T19:48:50.4321634Z # 2025-05-07T19:48:50.4336474Z # [2025-05-07T19:48:50.432Z] + install_cudnn build_binary /__w/FBGEMM/FBGEMM/build_only/cudnn 12.8.0 2025-05-07T19:48:50.4337102Z ################################################################################ 2025-05-07T19:48:50.4337342Z 2025-05-07T19:48:50.4359647Z [EXEC] [ATTEMPT 0/3] + wget -q --timeout 1 pypi.org -O /dev/null 2025-05-07T19:48:50.5204854Z [CHECK] Network does not appear to be blocked. 2025-05-07T19:48:50.5205340Z [INSTALL] cuda_concat_version is determined to be: 128 2025-05-07T19:48:50.5205964Z [INSTALL] Could not find cuDNN URL for the given cuda_concat_version 128; defaulting to cuDNN for CUDA 11.8 2025-05-07T19:48:50.5206567Z + rm -rf /__w/FBGEMM/FBGEMM/build_only/cudnn 2025-05-07T19:48:50.5206799Z 2025-05-07T19:48:50.5221529Z 2025-05-07T19:48:50.5222269Z + mkdir -p /__w/FBGEMM/FBGEMM/build_only/cudnn 2025-05-07T19:48:50.5223036Z 2025-05-07T19:48:50.5237966Z 2025-05-07T19:48:50.5452123Z [INSTALL] Downloading cuDNN to /tmp/tmp.9ciDOmo2Hi ... 2025-05-07T19:48:50.5470197Z [EXEC] [ATTEMPT 0/3] + wget -q https://developer.download.nvidia.com/compute/redist/cudnn/v8.7.0/local_installers/11.8/cudnn-linux-x86_64-8.7.0.84_cuda11-archive.tar.xz -O cudnn.tar.xz 2025-05-07T19:48:56.7540283Z [INSTALL] Unpacking cuDNN ... 2025-05-07T19:48:56.7541237Z + tar -xvf cudnn.tar.xz 2025-05-07T19:48:56.7541720Z 2025-05-07T19:48:56.7579716Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/ 2025-05-07T19:48:56.7580723Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/ 2025-05-07T19:48:56.7581207Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_adv_infer_static.a 2025-05-07T19:48:59.2051544Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_adv_infer_static_v8.a 2025-05-07T19:48:59.2053329Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_adv_train_static.a 2025-05-07T19:49:01.5218759Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_adv_train_static_v8.a 2025-05-07T19:49:01.5219901Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_cnn_infer_static.a 2025-05-07T19:49:09.9186217Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_cnn_infer_static_v8.a 2025-05-07T19:49:09.9188036Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_cnn_train_static.a 2025-05-07T19:49:11.5038640Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_cnn_train_static_v8.a 2025-05-07T19:49:11.5039613Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_ops_infer_static.a 2025-05-07T19:49:13.1866898Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_ops_infer_static_v8.a 2025-05-07T19:49:13.1868665Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_ops_train_static.a 2025-05-07T19:49:14.6992684Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_ops_train_static_v8.a 2025-05-07T19:49:14.6993752Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn.so.8 2025-05-07T19:49:14.6994225Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn.so 2025-05-07T19:49:14.6994748Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn.so.8.7.0 2025-05-07T19:49:14.7005068Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_adv_infer.so.8 2025-05-07T19:49:14.7007129Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_adv_infer.so 2025-05-07T19:49:14.7008746Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_adv_infer.so.8.7.0 2025-05-07T19:49:17.0698461Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_adv_train.so.8 2025-05-07T19:49:17.0699047Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_adv_train.so 2025-05-07T19:49:17.0699744Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_adv_train.so.8.7.0 2025-05-07T19:49:19.3183654Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_cnn_infer.so 2025-05-07T19:49:19.3184523Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_cnn_infer.so.8 2025-05-07T19:49:19.3185099Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_cnn_infer.so.8.7.0 2025-05-07T19:49:27.9291008Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_cnn_train.so 2025-05-07T19:49:27.9292718Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_cnn_train.so.8.7.0 2025-05-07T19:49:29.5898380Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_cnn_train.so.8 2025-05-07T19:49:29.5899009Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_ops_infer.so.8.7.0 2025-05-07T19:49:31.3205569Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_ops_infer.so 2025-05-07T19:49:31.3206346Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_ops_infer.so.8 2025-05-07T19:49:31.3207029Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_ops_train.so.8.7.0 2025-05-07T19:49:32.8741237Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_ops_train.so 2025-05-07T19:49:32.8741853Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_ops_train.so.8 2025-05-07T19:49:32.8742348Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/include/ 2025-05-07T19:49:32.8742797Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/include/cudnn_v8.h 2025-05-07T19:49:32.8743354Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/include/cudnn_adv_infer_v8.h 2025-05-07T19:49:32.8743893Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/include/cudnn_adv_train_v8.h 2025-05-07T19:49:32.8744454Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/include/cudnn_backend_v8.h 2025-05-07T19:49:32.8745003Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/include/cudnn_cnn_infer_v8.h 2025-05-07T19:49:32.8745537Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/include/cudnn_cnn_train_v8.h 2025-05-07T19:49:32.8746103Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/include/cudnn_ops_infer_v8.h 2025-05-07T19:49:32.8746646Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/include/cudnn_ops_train_v8.h 2025-05-07T19:49:32.8747203Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/include/cudnn_version_v8.h 2025-05-07T19:49:32.8747701Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/include/cudnn.h 2025-05-07T19:49:32.8748321Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/include/cudnn_adv_infer.h 2025-05-07T19:49:32.8748854Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/include/cudnn_adv_train.h 2025-05-07T19:49:32.8749355Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/include/cudnn_backend.h 2025-05-07T19:49:32.8749875Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/include/cudnn_cnn_infer.h 2025-05-07T19:49:32.8750375Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/include/cudnn_cnn_train.h 2025-05-07T19:49:32.8750903Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/include/cudnn_ops_infer.h 2025-05-07T19:49:32.8751431Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/include/cudnn_ops_train.h 2025-05-07T19:49:32.8751933Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/include/cudnn_version.h 2025-05-07T19:49:32.8752401Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/LICENSE 2025-05-07T19:49:32.8768795Z 2025-05-07T19:49:32.8769340Z [INSTALL] Moving cuDNN files to /__w/FBGEMM/FBGEMM/build_only/cudnn ... 2025-05-07T19:49:32.8769895Z + rm -rf /__w/FBGEMM/FBGEMM/build_only/cudnn/include 2025-05-07T19:49:32.8770158Z 2025-05-07T19:49:32.8784252Z 2025-05-07T19:49:32.8784416Z + rm -rf /__w/FBGEMM/FBGEMM/build_only/cudnn/lib 2025-05-07T19:49:32.8784691Z 2025-05-07T19:49:32.8804609Z 2025-05-07T19:49:32.8805008Z + mv cudnn-linux-x86_64-8.7.0.84_cuda11-archive/include /__w/FBGEMM/FBGEMM/build_only/cudnn 2025-05-07T19:49:32.8805672Z 2025-05-07T19:49:32.9178841Z 2025-05-07T19:49:32.9179477Z + mv cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib /__w/FBGEMM/FBGEMM/build_only/cudnn 2025-05-07T19:49:32.9179898Z 2025-05-07T19:49:34.5587404Z 2025-05-07T19:49:34.5587682Z /__w/FBGEMM/FBGEMM 2025-05-07T19:49:34.5588081Z + rm -rf /tmp/tmp.9ciDOmo2Hi 2025-05-07T19:49:34.5588284Z 2025-05-07T19:49:34.6170974Z 2025-05-07T19:49:34.6183739Z [INSTALL] Set environment variables CUDNN_INCLUDE_DIR and CUDNN_LIBRARY ... 2025-05-07T19:49:34.6184742Z + conda env config vars set -n build_binary CUDNN_INCLUDE_DIR=/__w/FBGEMM/FBGEMM/build_only/cudnn/include CUDNN_LIBRARY=/__w/FBGEMM/FBGEMM/build_only/cudnn/lib 2025-05-07T19:49:34.6185421Z 2025-05-07T19:49:35.0265797Z 2025-05-07T19:49:35.0266724Z [INSTALL] Successfully installed cuDNN (for CUDA 12.8.0) 2025-05-07T19:49:35.0340555Z ##[group]Run . $PRELUDE; cd fbgemm_gpu; prepare_fbgemm_gpu_build $BUILD_ENV 2025-05-07T19:49:35.0341267Z . $PRELUDE; cd fbgemm_gpu; prepare_fbgemm_gpu_build $BUILD_ENV 2025-05-07T19:49:35.0341928Z shell: bash --noprofile --norc -e -o pipefail {0} 2025-05-07T19:49:35.0342328Z env: 2025-05-07T19:49:35.0342588Z PRELUDE: .github/scripts/setup_env.bash 2025-05-07T19:49:35.0342955Z BUILD_ENV: build_binary 2025-05-07T19:49:35.0343233Z BUILD_TARGET: genai 2025-05-07T19:49:35.0343521Z BUILD_VARIANT: cuda 2025-05-07T19:49:35.0343814Z BUILD_CUDA_VERSION: 12.8.0 2025-05-07T19:49:35.0344097Z ##[endgroup] 2025-05-07T19:49:35.4803207Z ################################################################################ 2025-05-07T19:49:35.4804296Z # Prepare FBGEMM-GPU Build 2025-05-07T19:49:35.4805052Z # 2025-05-07T19:49:35.4820915Z # [2025-05-07T19:49:35.481Z] + prepare_fbgemm_gpu_build build_binary 2025-05-07T19:49:35.4822366Z ################################################################################ 2025-05-07T19:49:35.4823059Z 2025-05-07T19:49:35.4846246Z [EXEC] [ATTEMPT 0/3] + wget -q --timeout 1 pypi.org -O /dev/null 2025-05-07T19:49:35.5745417Z [CHECK] Network does not appear to be blocked. 2025-05-07T19:49:35.5766045Z [BUILD] Running git submodules update ... 2025-05-07T19:49:35.5788813Z [EXEC] [ATTEMPT 0/3] + git submodule sync 2025-05-07T19:49:35.6128001Z Synchronizing submodule url for '../external/asmjit' 2025-05-07T19:49:35.6129044Z Synchronizing submodule url for '../external/composable_kernel' 2025-05-07T19:49:35.6129753Z Synchronizing submodule url for '../external/cpuinfo' 2025-05-07T19:49:35.6130209Z Synchronizing submodule url for '../external/cutlass' 2025-05-07T19:49:35.6130659Z Synchronizing submodule url for '../external/googletest' 2025-05-07T19:49:35.6131144Z Synchronizing submodule url for '../external/hipify_torch' 2025-05-07T19:49:35.6131590Z Synchronizing submodule url for '../external/json' 2025-05-07T19:49:35.6162281Z [EXEC] [ATTEMPT 0/3] + git submodule update --init --recursive 2025-05-07T19:49:35.6621964Z [BUILD] Installing other build dependencies ... 2025-05-07T19:49:35.6643942Z [EXEC] [ATTEMPT 0/3] + conda run --no-capture-output -n build_binary python -m pip install -r requirements.txt 2025-05-07T19:49:37.7944290Z Collecting backports.tarfile (from -r requirements.txt (line 13)) 2025-05-07T19:49:37.8131571Z Downloading backports.tarfile-1.2.0-py3-none-any.whl.metadata (2.0 kB) 2025-05-07T19:49:37.8222495Z Requirement already satisfied: build in /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages (from -r requirements.txt (line 14)) (1.2.2.post1) 2025-05-07T19:49:37.9239630Z Collecting cmake (from -r requirements.txt (line 15)) 2025-05-07T19:49:37.9270402Z Downloading cmake-4.0.0-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.3 kB) 2025-05-07T19:49:37.9356334Z Requirement already satisfied: click in /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages (from -r requirements.txt (line 16)) (8.1.8) 2025-05-07T19:49:37.9358906Z Requirement already satisfied: hypothesis in /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages (from -r requirements.txt (line 17)) (6.131.14) 2025-05-07T19:49:37.9361408Z Requirement already satisfied: jinja2 in /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages (from -r requirements.txt (line 18)) (3.1.6) 2025-05-07T19:49:37.9365552Z Requirement already satisfied: mpmath==1.3.0 in /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages (from -r requirements.txt (line 19)) (1.3.0) 2025-05-07T19:49:37.9636918Z Collecting ninja (from -r requirements.txt (line 20)) 2025-05-07T19:49:37.9669366Z Downloading ninja-1.11.1.4-py3-none-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.metadata (5.0 kB) 2025-05-07T19:49:37.9751655Z Requirement already satisfied: numpy>=2.0.2 in /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages (from -r requirements.txt (line 21)) (2.2.5) 2025-05-07T19:49:37.9884212Z Collecting pyre-extensions (from -r requirements.txt (line 22)) 2025-05-07T19:49:37.9908816Z Downloading pyre_extensions-0.0.32-py3-none-any.whl.metadata (4.0 kB) 2025-05-07T19:49:37.9986901Z Requirement already satisfied: pyyaml in /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages (from -r requirements.txt (line 23)) (6.0.2) 2025-05-07T19:49:37.9989981Z Requirement already satisfied: scikit-build in /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages (from -r requirements.txt (line 24)) (0.18.1) 2025-05-07T19:49:38.0001088Z Requirement already satisfied: setuptools in /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages (from -r requirements.txt (line 25)) (78.1.1) 2025-05-07T19:49:38.0207870Z Collecting setuptools_git_versioning (from -r requirements.txt (line 26)) 2025-05-07T19:49:38.0240959Z Downloading setuptools_git_versioning-2.1.0-py3-none-any.whl.metadata (6.1 kB) 2025-05-07T19:49:38.0435779Z Collecting tabulate (from -r requirements.txt (line 27)) 2025-05-07T19:49:38.0462334Z Downloading tabulate-0.9.0-py3-none-any.whl.metadata (34 kB) 2025-05-07T19:49:38.0715933Z Collecting patchelf (from -r requirements.txt (line 28)) 2025-05-07T19:49:38.0746530Z Downloading patchelf-0.17.2.2-py3-none-manylinux1_x86_64.manylinux_2_5_x86_64.musllinux_1_1_x86_64.whl.metadata (3.5 kB) 2025-05-07T19:49:38.0845646Z Requirement already satisfied: packaging>=19.1 in /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages (from build->-r requirements.txt (line 14)) (25.0) 2025-05-07T19:49:38.0849775Z Requirement already satisfied: pyproject_hooks in /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages (from build->-r requirements.txt (line 14)) (1.2.0) 2025-05-07T19:49:38.0896958Z Requirement already satisfied: attrs>=22.2.0 in /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages (from hypothesis->-r requirements.txt (line 17)) (25.3.0) 2025-05-07T19:49:38.0902772Z Requirement already satisfied: sortedcontainers<3.0.0,>=2.1.0 in /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages (from hypothesis->-r requirements.txt (line 17)) (2.4.0) 2025-05-07T19:49:38.0949431Z Requirement already satisfied: MarkupSafe>=2.0 in /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages (from jinja2->-r requirements.txt (line 18)) (3.0.2) 2025-05-07T19:49:38.1077975Z Collecting typing-inspect (from pyre-extensions->-r requirements.txt (line 22)) 2025-05-07T19:49:38.1130811Z Downloading typing_inspect-0.9.0-py3-none-any.whl.metadata (1.5 kB) 2025-05-07T19:49:38.1203791Z Requirement already satisfied: typing-extensions in /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages (from pyre-extensions->-r requirements.txt (line 22)) (4.13.2) 2025-05-07T19:49:38.1219224Z Requirement already satisfied: distro in /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages (from scikit-build->-r requirements.txt (line 24)) (1.9.0) 2025-05-07T19:49:38.1230735Z Requirement already satisfied: wheel>=0.32.0 in /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages (from scikit-build->-r requirements.txt (line 24)) (0.45.1) 2025-05-07T19:49:38.1495170Z Collecting mypy-extensions>=0.3.0 (from typing-inspect->pyre-extensions->-r requirements.txt (line 22)) 2025-05-07T19:49:38.1525224Z Downloading mypy_extensions-1.1.0-py3-none-any.whl.metadata (1.1 kB) 2025-05-07T19:49:38.1644621Z Downloading backports.tarfile-1.2.0-py3-none-any.whl (30 kB) 2025-05-07T19:49:38.1733741Z Downloading cmake-4.0.0-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (27.9 MB) 2025-05-07T19:49:38.2846277Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 27.9/27.9 MB 257.1 MB/s eta 0:00:00 2025-05-07T19:49:38.2877392Z Downloading ninja-1.11.1.4-py3-none-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (422 kB) 2025-05-07T19:49:38.2968254Z Downloading pyre_extensions-0.0.32-py3-none-any.whl (12 kB) 2025-05-07T19:49:38.3029340Z Downloading setuptools_git_versioning-2.1.0-py3-none-any.whl (10 kB) 2025-05-07T19:49:38.3093318Z Downloading tabulate-0.9.0-py3-none-any.whl (35 kB) 2025-05-07T19:49:38.3149614Z Downloading patchelf-0.17.2.2-py3-none-manylinux1_x86_64.manylinux_2_5_x86_64.musllinux_1_1_x86_64.whl (466 kB) 2025-05-07T19:49:38.3222201Z Downloading typing_inspect-0.9.0-py3-none-any.whl (8.8 kB) 2025-05-07T19:49:38.3294503Z Downloading mypy_extensions-1.1.0-py3-none-any.whl (5.0 kB) 2025-05-07T19:49:38.4817362Z Installing collected packages: tabulate, setuptools_git_versioning, patchelf, ninja, mypy-extensions, cmake, backports.tarfile, typing-inspect, pyre-extensions 2025-05-07T19:49:39.3053040Z 2025-05-07T19:49:39.3101287Z Successfully installed backports.tarfile-1.2.0 cmake-4.0.0 mypy-extensions-1.1.0 ninja-1.11.1.4 patchelf-0.17.2.2 pyre-extensions-0.0.32 setuptools_git_versioning-2.1.0 tabulate-0.9.0 typing-inspect-0.9.0 2025-05-07T19:49:39.3103635Z WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager, possibly rendering your system unusable. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv. Use the --root-user-action option if you know what you are doing and want to suppress this warning. 2025-05-07T19:49:39.4453656Z ################################################################################ 2025-05-07T19:49:39.4454076Z # Install PyTorch (PyTorch PIP) 2025-05-07T19:49:39.4454385Z # 2025-05-07T19:49:39.4471855Z # [2025-05-07T19:49:39.446Z] + install_triton_pip build_binary 2025-05-07T19:49:39.4472761Z ################################################################################ 2025-05-07T19:49:39.4473086Z 2025-05-07T19:49:39.4473337Z [BUILD] Installing pytorch-triton nightly/3.2.0+git4b3bb1f8 from PIP ... 2025-05-07T19:49:39.4473835Z ################################################################################ 2025-05-07T19:49:39.4474222Z # Install Package From PyTorch PIP: pytorch-triton 2025-05-07T19:49:39.4474615Z # 2025-05-07T19:49:39.4495766Z # [2025-05-07T19:49:39.448Z] + install_from_pytorch_pip build_binary pytorch-triton nightly/3.2.0+git4b3bb1f8 2025-05-07T19:49:39.4496547Z ################################################################################ 2025-05-07T19:49:39.4496785Z 2025-05-07T19:49:39.4516583Z [EXEC] [ATTEMPT 0/3] + wget -q --timeout 1 pypi.org -O /dev/null 2025-05-07T19:49:39.5462315Z [CHECK] Network does not appear to be blocked. 2025-05-07T19:49:39.5463428Z ################################################################################ 2025-05-07T19:49:39.5464478Z # Prepare PIP Arguments (PyTorch PIP) 2025-05-07T19:49:39.5465317Z # 2025-05-07T19:49:39.5485029Z # [2025-05-07T19:49:39.547Z] + __prepare_pip_arguments pytorch-triton nightly/3.2.0+git4b3bb1f8 2025-05-07T19:49:39.5486611Z ################################################################################ 2025-05-07T19:49:39.5487318Z 2025-05-07T19:49:39.5534056Z [INSTALL] Extracted package (channel, version): (nightly, 3.2.0+git4b3bb1f8) 2025-05-07T19:49:39.5548785Z [INSTALL] Using a non-RELEASE channel: nightly ... 2025-05-07T19:49:39.5550458Z [INSTALL] Extracted the full PIP channel: https://download.pytorch.org/whl/nightly/ 2025-05-07T19:49:39.5556458Z [INSTALL] Extracted the full PIP package: --pre pytorch-triton==3.2.0+git4b3bb1f8 2025-05-07T19:49:39.5563263Z [INSTALL] Attempting to install [pytorch-triton, 3.2.0+git4b3bb1f8] from PyTorch PIP using channel https://download.pytorch.org/whl/nightly/ ... 2025-05-07T19:49:39.5588583Z [EXEC] [ATTEMPT 0/3] + conda run -n build_binary pip install --pre pytorch-triton==3.2.0+git4b3bb1f8 --index-url https://download.pytorch.org/whl/nightly/ 2025-05-07T19:49:45.1692281Z ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. 2025-05-07T19:49:45.1693696Z torch 2.8.0.dev20250507+cu128 requires pytorch-triton==3.3.0+git96316ce5; platform_system == "Linux", but you have pytorch-triton 3.2.0+git4b3bb1f8 which is incompatible. 2025-05-07T19:49:45.1697329Z WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager, possibly rendering your system unusable. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv. Use the --root-user-action option if you know what you are doing and want to suppress this warning. 2025-05-07T19:49:45.1698997Z 2025-05-07T19:49:45.1705126Z Looking in indexes: https://download.pytorch.org/whl/nightly/ 2025-05-07T19:49:45.1705594Z Collecting pytorch-triton==3.2.0+git4b3bb1f8 2025-05-07T19:49:45.1706500Z Downloading https://download.pytorch.org/whl/nightly/pytorch_triton-3.2.0%2Bgit4b3bb1f8-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (1.3 kB) 2025-05-07T19:49:45.1707878Z Downloading https://download.pytorch.org/whl/nightly/pytorch_triton-3.2.0%2Bgit4b3bb1f8-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (166.5 MB) 2025-05-07T19:49:45.1709185Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 166.5/166.5 MB 158.3 MB/s eta 0:00:00 2025-05-07T19:49:45.1709592Z Installing collected packages: pytorch-triton 2025-05-07T19:49:45.1709988Z Attempting uninstall: pytorch-triton 2025-05-07T19:49:45.1710418Z Found existing installation: pytorch-triton 3.3.0+git96316ce5 2025-05-07T19:49:45.1710870Z Uninstalling pytorch-triton-3.3.0+git96316ce5: 2025-05-07T19:49:45.1711331Z Successfully uninstalled pytorch-triton-3.3.0+git96316ce5 2025-05-07T19:49:45.1711800Z Successfully installed pytorch-triton-3.2.0+git4b3bb1f8 2025-05-07T19:49:45.1712097Z 2025-05-07T19:49:47.2811343Z [CHECK] Python (sub-)package 'triton' found ... 2025-05-07T19:49:47.2813593Z [CHECK] Printing out the pytorch-triton version ... 2025-05-07T19:49:49.2919111Z ################################################################################ 2025-05-07T19:49:49.2920428Z [CHECK] The installed VERSION of pytorch-triton is: 3.2.0 2025-05-07T19:49:49.2921700Z ################################################################################ 2025-05-07T19:49:49.2922388Z 2025-05-07T19:49:51.2779491Z [CHECK] Python (sub-)package 'numpy' found ... 2025-05-07T19:49:53.3386239Z [CHECK] Python (sub-)package 'skbuild' found ... 2025-05-07T19:49:53.3387510Z [BUILD] Successfully ran git submodules update 2025-05-07T19:49:53.3490635Z ##[group]Run . $PRELUDE; cd fbgemm_gpu; build_fbgemm_gpu_package $BUILD_ENV nightly genai/cuda 2025-05-07T19:49:53.3491363Z . $PRELUDE; cd fbgemm_gpu; build_fbgemm_gpu_package $BUILD_ENV nightly genai/cuda 2025-05-07T19:49:53.3491975Z shell: bash --noprofile --norc -e -o pipefail {0} 2025-05-07T19:49:53.3492320Z env: 2025-05-07T19:49:53.3492544Z PRELUDE: .github/scripts/setup_env.bash 2025-05-07T19:49:53.3492874Z BUILD_ENV: build_binary 2025-05-07T19:49:53.3493117Z BUILD_TARGET: genai 2025-05-07T19:49:53.3493361Z BUILD_VARIANT: cuda 2025-05-07T19:49:53.3493610Z BUILD_CUDA_VERSION: 12.8.0 2025-05-07T19:49:53.3493870Z ##[endgroup] 2025-05-07T19:49:53.7591944Z [BUILD] BUILD_TARGET_VARIANT: genai/cuda 2025-05-07T19:49:53.7592366Z [BUILD] Extracted build target: genai 2025-05-07T19:49:53.7592924Z [BUILD] Extracted build variant: cuda 2025-05-07T19:49:55.5710673Z /github/home/miniconda/envs/build_binary/bin/cc 2025-05-07T19:49:55.5711497Z 2025-05-07T19:49:55.6285768Z [CHECK] Binary cc found in PATH 2025-05-07T19:49:57.4185678Z /github/home/miniconda/envs/build_binary/bin/gcc 2025-05-07T19:49:57.4186270Z 2025-05-07T19:49:57.4774410Z [CHECK] Binary gcc found in PATH 2025-05-07T19:49:59.2743607Z /github/home/miniconda/envs/build_binary/bin/c++ 2025-05-07T19:49:59.2744066Z 2025-05-07T19:49:59.3329292Z [CHECK] Binary c++ found in PATH 2025-05-07T19:50:01.1253255Z /github/home/miniconda/envs/build_binary/bin/g++ 2025-05-07T19:50:01.1254073Z 2025-05-07T19:50:01.1841525Z [CHECK] Binary g++ found in PATH 2025-05-07T19:50:03.0846783Z [BUILD] Extracted and set Python tag: py313 2025-05-07T19:50:03.0847349Z [BUILD] Extracted and set Python platform name: manylinux_2_28_x86_64 2025-05-07T19:50:03.1067561Z core = 24 2025-05-07T19:50:03.1279128Z sockets = 2 2025-05-07T19:50:03.1280094Z [BUILD] Set multicore run option for setup.py: -j 48 2025-05-07T19:50:03.1281175Z [CHECK] LD_LIBRARY_PATH = 2025-05-07T19:50:03.1282011Z [BUILD] Running pre-build cleanups ... 2025-05-07T19:50:03.1282900Z + rm -rf dist 2025-05-07T19:50:03.1283294Z 2025-05-07T19:50:03.1293749Z 2025-05-07T19:50:03.1294660Z + conda run --no-capture-output -n build_binary python setup.py clean 2025-05-07T19:50:03.1295671Z 2025-05-07T19:50:06.2537434Z INFO:root:running clean 2025-05-07T19:50:06.2538385Z [SETUP.PY] ARGV: ['setup.py', 'clean'] 2025-05-07T19:50:06.2539960Z [SETUP.PY] Parsed setup.py arguments: Namespace(verbose=False, debug=False, dryrun=False, build_target='default', build_variant='cuda', package_channel='nightly', nvml_lib_path=None, nccl_lib_path=None, use_fb_only=False, cxxprefix=None) 2025-05-07T19:50:06.2541101Z [SETUP.PY] Other arguments: ['clean'] 2025-05-07T19:50:06.2541674Z [SETUP.PY] CUDA CUB directory environment variable not set. Using default CUB location. 2025-05-07T19:50:06.2542261Z [SETUP.PY] Using CUDA = /github/home/miniconda/envs/build_binary 2025-05-07T19:50:06.2542878Z [SETUP.PY] Generating version file at: /__w/FBGEMM/FBGEMM/fbgemm_gpu/fbgemm_gpu/docs/version.py 2025-05-07T19:50:06.2543478Z [SETUP.PY] Setting the FBGEMM build target: default ... 2025-05-07T19:50:06.2543917Z [SETUP.PY] Setting the FBGEMM build variant: cuda ... 2025-05-07T19:50:06.2545208Z [SETUP.PY] Passing CMake arguments: ['-DCMAKE_PREFIX_PATH=/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch', '-D_GLIBCXX_USE_CXX11_ABI=1', '-DFBGEMM_BUILD_TARGET=default', '-DFBGEMM_BUILD_VARIANT=cuda', "-DCMAKE_C_FLAGS=''", "-DCMAKE_CXX_FLAGS=''"] 2025-05-07T19:50:06.6396548Z 2025-05-07T19:50:06.6397312Z [BUILD] Printing git status ... 2025-05-07T19:50:06.6398213Z + git status 2025-05-07T19:50:06.6398587Z 2025-05-07T19:50:07.2245365Z HEAD detached at pull/4066/merge 2025-05-07T19:50:07.2246323Z Untracked files: 2025-05-07T19:50:07.2247215Z (use "git add ..." to include in what will be committed) 2025-05-07T19:50:07.2248333Z ../build_only/ 2025-05-07T19:50:07.2248970Z ../collect_env.py 2025-05-07T19:50:07.2249610Z fbgemm_gpu/docs/version.py 2025-05-07T19:50:07.2249912Z 2025-05-07T19:50:07.2250490Z nothing added to commit but untracked files present (use "git add" to track) 2025-05-07T19:50:07.2250854Z 2025-05-07T19:50:07.2250942Z + git diff 2025-05-07T19:50:07.2251084Z 2025-05-07T19:50:07.2542141Z 2025-05-07T19:50:07.2543335Z ################################################################################ 2025-05-07T19:50:07.2544281Z # Configure FBGEMM-GPU Build 2025-05-07T19:50:07.2544574Z # 2025-05-07T19:50:07.2562383Z # [2025-05-07T19:50:07.255Z] + __configure_fbgemm_gpu_build 2025-05-07T19:50:07.2562908Z ################################################################################ 2025-05-07T19:50:07.2563184Z 2025-05-07T19:50:07.2579823Z [BUILD] Setting the build target: genai ... 2025-05-07T19:50:07.2581085Z [BUILD] Configuring build as CUDA variant (this is the default behavior) ... 2025-05-07T19:50:09.0930872Z /github/home/miniconda/envs/build_binary/bin/nvcc 2025-05-07T19:50:09.0931212Z 2025-05-07T19:50:09.1533655Z [CHECK] Binary nvcc found in PATH 2025-05-07T19:50:10.9718223Z /__w/FBGEMM/FBGEMM/build_only/cudnn/include 2025-05-07T19:50:10.9718570Z 2025-05-07T19:50:11.0361320Z [CHECK] Environment variable CUDNN_INCLUDE_DIR is defined in the Conda environment 2025-05-07T19:50:12.8463551Z /__w/FBGEMM/FBGEMM/build_only/cudnn/lib 2025-05-07T19:50:12.8464272Z 2025-05-07T19:50:12.9061643Z [CHECK] Environment variable CUDNN_LIBRARY is defined in the Conda environment 2025-05-07T19:50:14.7265545Z /github/home/miniconda/envs/build_binary/lib/stubs/libnvidia-ml.so 2025-05-07T19:50:14.7265978Z 2025-05-07T19:50:14.7850315Z [CHECK] Environment variable NVML_LIB_PATH is defined in the Conda environment 2025-05-07T19:50:16.6445485Z [BUILD] Using the default architectures for CUDA nvcc: NVIDIA (R) Cuda compiler driver 2025-05-07T19:50:16.6446771Z Copyright (c) 2005-2025 NVIDIA Corporation 2025-05-07T19:50:16.6447285Z Built on Wed_Jan_15_19:20:09_PST_2025 2025-05-07T19:50:16.6447637Z Cuda compilation tools, release 12.8, V12.8.61 2025-05-07T19:50:16.6448069Z Build cuda_12.8.r12.8/compiler.35404655_0 ... 2025-05-07T19:50:16.6448524Z [BUILD] Setting the following CUDA targets: 7.0;8.0;9.0;9.0a;10.0a;12.0a 2025-05-07T19:50:16.6449008Z [BUILD] Looking up NVML filepath ... 2025-05-07T19:50:18.5496821Z [BUILD] Looking up NCCL filepath ... 2025-05-07T19:50:22.3960721Z [BUILD] Setting NVCC verbose mode ... 2025-05-07T19:50:22.3961983Z + conda env config vars set -n build_binary NVCC_VERBOSE=1 2025-05-07T19:50:22.3962857Z 2025-05-07T19:50:22.8091049Z 2025-05-07T19:50:22.8091920Z [BUILD] Setting CUDA build args ... 2025-05-07T19:50:24.7332648Z [BUILD] Looking up CUDA version ... 2025-05-07T19:50:28.4936359Z + conda run -n build_binary c++ --version | grep -i clang 2025-05-07T19:50:28.4937424Z 2025-05-07T19:50:30.3461070Z 2025-05-07T19:50:30.3462122Z [BUILD] Setting NVCC flags ... 2025-05-07T19:50:30.3463167Z + conda env config vars set -n build_binary NVCC_PREPEND_FLAGS="-std=c++20 -Xcompiler -std=c++20 -ccbin /github/home/miniconda/envs/build_binary/bin/c++ -allow-unsupported-compiler" 2025-05-07T19:50:30.3464008Z 2025-05-07T19:50:30.7517895Z 2025-05-07T19:50:30.7518845Z + conda run -n build_binary printenv NVCC_PREPEND_FLAGS 2025-05-07T19:50:30.7519751Z 2025-05-07T19:50:32.5452924Z -std=c++20 -Xcompiler -std=c++20 -ccbin /github/home/miniconda/envs/build_binary/bin/c++ -allow-unsupported-compiler 2025-05-07T19:50:32.5454572Z 2025-05-07T19:50:32.6040352Z 2025-05-07T19:50:32.6040866Z [BUILD] Setting CUDA build args ... 2025-05-07T19:50:32.6041410Z + conda run -n build_binary c++ --version 2025-05-07T19:50:32.6041649Z 2025-05-07T19:50:34.4137365Z c++ (conda-forge gcc 11.4.0-13) 11.4.0 2025-05-07T19:50:34.4138524Z Copyright (C) 2021 Free Software Foundation, Inc. 2025-05-07T19:50:34.4139485Z This is free software; see the source for copying conditions. There is NO 2025-05-07T19:50:34.4140113Z warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. 2025-05-07T19:50:34.4140500Z 2025-05-07T19:50:34.4140505Z 2025-05-07T19:50:34.4711517Z 2025-05-07T19:50:34.4712949Z + conda run -n build_binary c++ --version | grep -i clang 2025-05-07T19:50:34.4713846Z 2025-05-07T19:50:36.3185341Z 2025-05-07T19:50:36.3185945Z [BUILD] Enabling debug features in the build ... 2025-05-07T19:50:36.3187847Z .github/scripts/fbgemm_gpu_build.bash: line 370: [: : integer expression expected 2025-05-07T19:50:36.3190860Z [BUILD] FBGEMM_GPU build arguments have been set: --verbose --build-target=genai --build-variant=cuda --nvml_lib_path=/github/home/miniconda/envs/build_binary/lib/stubs/libnvidia-ml.so --nccl_lib_path=/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib/libnccl.so.2 -DTORCH_CUDA_ARCH_LIST='7.0;8.0;9.0;9.0a;10.0a;12.0a' -DCUDA_TOOLKIT_ROOT_DIR=/github/home/miniconda/envs/build_binary/targets/x86_64-linux -DCUDAToolkit_ROOT=/github/home/miniconda/envs/build_binary/targets/x86_64-linux -DCMAKE_CXX_STANDARD=20 --debug 2025-05-07T19:50:36.3193376Z ################################################################################ 2025-05-07T19:50:36.3193755Z # Build FBGEMM-GPU Package (Wheel) 2025-05-07T19:50:36.3194042Z # 2025-05-07T19:50:36.3206890Z # [2025-05-07T19:50:36.320Z] + build_fbgemm_gpu_package build_binary nightly genai/cuda 2025-05-07T19:50:36.3208170Z ################################################################################ 2025-05-07T19:50:36.3208453Z 2025-05-07T19:50:36.3208656Z [BUILD] Building FBGEMM wheel (TARGET=genai, VARIANT=cuda) ... 2025-05-07T19:50:36.3214931Z + conda run --no-capture-output -n build_binary python -m build --wheel --no-isolation --config-setting=--build-option=--verbose --config-setting=--build-option=--build-target=genai --config-setting=--build-option=--build-variant=cuda --config-setting=--build-option=--nvml_lib_path=/github/home/miniconda/envs/build_binary/lib/stubs/libnvidia-ml.so --config-setting=--build-option=--nccl_lib_path=/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib/libnccl.so.2 --config-setting=--build-option=-DTORCH_CUDA_ARCH_LIST='7.0;8.0;9.0;9.0a;10.0a;12.0a' --config-setting=--build-option=-DCUDA_TOOLKIT_ROOT_DIR=/github/home/miniconda/envs/build_binary/targets/x86_64-linux --config-setting=--build-option=-DCUDAToolkit_ROOT=/github/home/miniconda/envs/build_binary/targets/x86_64-linux --config-setting=--build-option=-DCMAKE_CXX_STANDARD=20 --config-setting=--build-option=--debug --config-setting=--build-option=--package_channel=nightly --config-setting=--build-option=--python-tag=py313 --config-setting=--build-option=--plat-name=manylinux_2_28_x86_64 2025-05-07T19:50:36.3220196Z 2025-05-07T19:50:38.1778694Z * Getting build dependencies for wheel... 2025-05-07T19:50:39.4883264Z INFO:root:running egg_info 2025-05-07T19:50:39.4921991Z INFO:root:creating fbgemm_gpu_nightly.egg-info 2025-05-07T19:50:39.4923337Z INFO:root:writing fbgemm_gpu_nightly.egg-info/PKG-INFO 2025-05-07T19:50:39.4924998Z INFO:root:writing dependency_links to fbgemm_gpu_nightly.egg-info/dependency_links.txt 2025-05-07T19:50:39.4926057Z INFO:root:writing requirements to fbgemm_gpu_nightly.egg-info/requires.txt 2025-05-07T19:50:39.4926882Z INFO:root:writing top-level names to fbgemm_gpu_nightly.egg-info/top_level.txt 2025-05-07T19:50:39.4991465Z INFO:root:writing manifest file 'fbgemm_gpu_nightly.egg-info/SOURCES.txt' 2025-05-07T19:50:39.4993270Z INFO:root:reading manifest file 'fbgemm_gpu_nightly.egg-info/SOURCES.txt' 2025-05-07T19:50:39.5002248Z INFO:root:writing manifest file 'fbgemm_gpu_nightly.egg-info/SOURCES.txt' 2025-05-07T19:50:39.5005606Z [SETUP.PY] ARGV: ['setup.py', 'egg_info'] 2025-05-07T19:50:39.5008501Z [SETUP.PY] Parsed setup.py arguments: Namespace(verbose=False, debug=False, dryrun=False, build_target='default', build_variant='cuda', package_channel='nightly', nvml_lib_path=None, nccl_lib_path=None, use_fb_only=False, cxxprefix=None) 2025-05-07T19:50:39.5009623Z [SETUP.PY] Other arguments: ['egg_info'] 2025-05-07T19:50:39.5010154Z [SETUP.PY] CUDA CUB directory environment variable not set. Using default CUB location. 2025-05-07T19:50:39.5011132Z [SETUP.PY] Using CUDA = /github/home/miniconda/envs/build_binary 2025-05-07T19:50:39.5011730Z [SETUP.PY] Generating version file at: /__w/FBGEMM/FBGEMM/fbgemm_gpu/fbgemm_gpu/docs/version.py 2025-05-07T19:50:39.5012318Z [SETUP.PY] Setting the FBGEMM build target: default ... 2025-05-07T19:50:39.5012846Z [SETUP.PY] Setting the FBGEMM build variant: cuda ... 2025-05-07T19:50:39.5014057Z [SETUP.PY] Passing CMake arguments: ['-DCMAKE_PREFIX_PATH=/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch', '-D_GLIBCXX_USE_CXX11_ABI=1', '-DFBGEMM_BUILD_TARGET=default', '-DFBGEMM_BUILD_VARIANT=cuda', "-DCMAKE_C_FLAGS=''", "-DCMAKE_CXX_FLAGS=''"] 2025-05-07T19:50:39.7960530Z * Building wheel... 2025-05-07T19:50:41.1044247Z [SETUP.PY] ARGV: ['setup.py', 'bdist_wheel', '--dist-dir', '/__w/FBGEMM/FBGEMM/fbgemm_gpu/dist/.tmp-y5sgrcjz', '--verbose', '--build-target=genai', '--build-variant=cuda', '--nvml_lib_path=/github/home/miniconda/envs/build_binary/lib/stubs/libnvidia-ml.so', '--nccl_lib_path=/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib/libnccl.so.2', '-DTORCH_CUDA_ARCH_LIST=7.0;8.0;9.0;9.0a;10.0a;12.0a', '-DCUDA_TOOLKIT_ROOT_DIR=/github/home/miniconda/envs/build_binary/targets/x86_64-linux', '-DCUDAToolkit_ROOT=/github/home/miniconda/envs/build_binary/targets/x86_64-linux', '-DCMAKE_CXX_STANDARD=20', '--debug', '--package_channel=nightly', '--python-tag=py313', '--plat-name=manylinux_2_28_x86_64'] 2025-05-07T19:50:41.1049396Z [SETUP.PY] Parsed setup.py arguments: Namespace(verbose=True, debug=True, dryrun=False, build_target='genai', build_variant='cuda', package_channel='nightly', nvml_lib_path='/github/home/miniconda/envs/build_binary/lib/stubs/libnvidia-ml.so', nccl_lib_path='/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib/libnccl.so.2', use_fb_only=False, cxxprefix=None) 2025-05-07T19:50:41.1052891Z [SETUP.PY] Other arguments: ['bdist_wheel', '--dist-dir', '/__w/FBGEMM/FBGEMM/fbgemm_gpu/dist/.tmp-y5sgrcjz', '-DTORCH_CUDA_ARCH_LIST=7.0;8.0;9.0;9.0a;10.0a;12.0a', '-DCUDA_TOOLKIT_ROOT_DIR=/github/home/miniconda/envs/build_binary/targets/x86_64-linux', '-DCUDAToolkit_ROOT=/github/home/miniconda/envs/build_binary/targets/x86_64-linux', '-DCMAKE_CXX_STANDARD=20', '--python-tag=py313', '--plat-name=manylinux_2_28_x86_64'] 2025-05-07T19:50:41.1054685Z [SETUP.PY] CUDA CUB directory environment variable not set. Using default CUB location. 2025-05-07T19:50:41.1055239Z [SETUP.PY] Using CUDA = /github/home/miniconda/envs/build_binary 2025-05-07T19:50:41.1055819Z [SETUP.PY] Generating version file at: /__w/FBGEMM/FBGEMM/fbgemm_gpu/fbgemm_gpu/docs/version.py 2025-05-07T19:50:41.1056348Z [SETUP.PY] Setting the FBGEMM build target: genai ... 2025-05-07T19:50:41.1056757Z [SETUP.PY] Setting the FBGEMM build variant: cuda ... 2025-05-07T19:50:41.1061908Z [SETUP.PY] Passing CMake arguments: ['-DCMAKE_PREFIX_PATH=/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch', '-D_GLIBCXX_USE_CXX11_ABI=1', '-DCMAKE_VERBOSE_MAKEFILE=ON', '-DCMAKE_EXPORT_COMPILE_COMMANDS=TRUE', '-DFBGEMM_BUILD_TARGET=genai', '-DFBGEMM_BUILD_VARIANT=cuda', '-DNVML_LIB_PATH=/github/home/miniconda/envs/build_binary/lib/stubs/libnvidia-ml.so', '-DNCCL_INCLUDE_DIRS=/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include', '-DNCCL_LIBRARIES=/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib/libnccl.so.2', "-DCMAKE_C_FLAGS='-DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib'", "-DCMAKE_CXX_FLAGS='-DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib'", '-DTORCH_CUDA_ARCH_LIST=7.0;8.0;9.0;9.0a;10.0a;12.0a', '-DCUDA_TOOLKIT_ROOT_DIR=/github/home/miniconda/envs/build_binary/targets/x86_64-linux', '-DCUDAToolkit_ROOT=/github/home/miniconda/envs/build_binary/targets/x86_64-linux', '-DCMAKE_CXX_STANDARD=20'] 2025-05-07T19:50:41.1067016Z 2025-05-07T19:50:41.1067022Z 2025-05-07T19:50:41.1067347Z -------------------------------------------------------------------------------- 2025-05-07T19:50:41.1067741Z -- Trying 'Ninja' generator 2025-05-07T19:50:41.1067995Z -------------------------------- 2025-05-07T19:50:41.1068274Z --------------------------- 2025-05-07T19:50:41.1068509Z ---------------------- 2025-05-07T19:50:41.1068749Z ----------------- 2025-05-07T19:50:41.1068952Z ------------ 2025-05-07T19:50:41.1069164Z ------- 2025-05-07T19:50:41.1069351Z -- 2025-05-07T19:50:41.1458175Z CMake Deprecation Warning at CMakeLists.txt:1 (cmake_minimum_required): 2025-05-07T19:50:41.1460049Z Not searching for unused variables given on the command line. 2025-05-07T19:50:41.1461668Z Compatibility with CMake < 3.10 will be removed from a future version of 2025-05-07T19:50:41.1462918Z CMake. 2025-05-07T19:50:41.1463383Z 2025-05-07T19:50:41.1463736Z Update the VERSION argument value. Or, use the ... syntax 2025-05-07T19:50:41.1464307Z to tell CMake that the project requires at least but has been updated 2025-05-07T19:50:41.1464784Z to work with policies introduced by or earlier. 2025-05-07T19:50:41.1465034Z 2025-05-07T19:50:41.1465038Z 2025-05-07T19:50:41.1897190Z -- The C compiler identification is GNU 11.4.0 2025-05-07T19:50:41.1982543Z -- Detecting C compiler ABI info 2025-05-07T19:50:41.2980814Z -- Detecting C compiler ABI info - done 2025-05-07T19:50:41.3164452Z -- Check for working C compiler: /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-cc - skipped 2025-05-07T19:50:41.3165124Z -- Detecting C compile features 2025-05-07T19:50:41.3168608Z -- Detecting C compile features - done 2025-05-07T19:50:41.3983506Z -- The CXX compiler identification is GNU 11.4.0 2025-05-07T19:50:41.4059064Z -- Detecting CXX compiler ABI info 2025-05-07T19:50:41.5042028Z -- Detecting CXX compiler ABI info - done 2025-05-07T19:50:41.5238579Z -- Check for working CXX compiler: /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ - skipped 2025-05-07T19:50:41.5239294Z -- Detecting CXX compile features 2025-05-07T19:50:41.5246501Z -- Detecting CXX compile features - done 2025-05-07T19:50:41.5311537Z -- Configuring done (0.4s) 2025-05-07T19:50:41.5361427Z -- Generating done (0.0s) 2025-05-07T19:50:41.5377868Z -- Build files have been written to: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_cmake_test_compile/build 2025-05-07T19:50:41.5422024Z -- 2025-05-07T19:50:41.5422289Z ------- 2025-05-07T19:50:41.5422498Z ------------ 2025-05-07T19:50:41.5422726Z ----------------- 2025-05-07T19:50:41.5422949Z ---------------------- 2025-05-07T19:50:41.5423205Z --------------------------- 2025-05-07T19:50:41.5423462Z -------------------------------- 2025-05-07T19:50:41.5423781Z -- Trying 'Ninja' generator - success 2025-05-07T19:50:41.5424158Z -------------------------------------------------------------------------------- 2025-05-07T19:50:41.5424472Z 2025-05-07T19:50:41.5441527Z Configuring Project 2025-05-07T19:50:41.5441957Z Working directory: 2025-05-07T19:50:41.5442888Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.13/cmake-build 2025-05-07T19:50:41.5443354Z Command: 2025-05-07T19:50:41.5463517Z /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/cmake/data/bin/cmake /__w/FBGEMM/FBGEMM/fbgemm_gpu -G Ninja -DCMAKE_MAKE_PROGRAM:FILEPATH=/github/home/miniconda/envs/build_binary/bin/ninja --no-warn-unused-cli -DCMAKE_INSTALL_PREFIX:PATH=/__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.13/cmake-install -DPYTHON_VERSION_STRING:STRING=3.13.2 -DSKBUILD:INTERNAL=TRUE -DCMAKE_MODULE_PATH:PATH=/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/skbuild/resources/cmake -DPYTHON_EXECUTABLE:PATH=/github/home/miniconda/envs/build_binary/bin/python -DPYTHON_INCLUDE_DIR:PATH=/github/home/miniconda/envs/build_binary/include/python3.13 -DPYTHON_LIBRARY:PATH=/github/home/miniconda/envs/build_binary/lib/libpython3.13.so -DPython_EXECUTABLE:PATH=/github/home/miniconda/envs/build_binary/bin/python -DPython_ROOT_DIR:PATH=/github/home/miniconda/envs/build_binary -DPython_FIND_REGISTRY:STRING=NEVER -DPython_INCLUDE_DIR:PATH=/github/home/miniconda/envs/build_binary/include/python3.13 -DPython_NumPy_INCLUDE_DIRS:PATH=/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/numpy/_core/include -DPython3_EXECUTABLE:PATH=/github/home/miniconda/envs/build_binary/bin/python -DPython3_ROOT_DIR:PATH=/github/home/miniconda/envs/build_binary -DPython3_FIND_REGISTRY:STRING=NEVER -DPython3_INCLUDE_DIR:PATH=/github/home/miniconda/envs/build_binary/include/python3.13 -DPython3_NumPy_INCLUDE_DIRS:PATH=/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/numpy/_core/include -DCMAKE_MAKE_PROGRAM:FILEPATH=/github/home/miniconda/envs/build_binary/bin/ninja -DCMAKE_AR=/github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-ar -DCMAKE_CXX_COMPILER_AR=/github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-gcc-ar -DCMAKE_C_COMPILER_AR=/github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-gcc-ar -DCMAKE_RANLIB=/github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-ranlib -DCMAKE_CXX_COMPILER_RANLIB=/github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-gcc-ranlib -DCMAKE_C_COMPILER_RANLIB=/github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-gcc-ranlib -DCMAKE_LINKER=/github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-ld -DCMAKE_STRIP=/github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-strip -DCMAKE_BUILD_TYPE=Release -DCMAKE_PREFIX_PATH=/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch -D_GLIBCXX_USE_CXX11_ABI=1 -DCMAKE_VERBOSE_MAKEFILE=ON -DCMAKE_EXPORT_COMPILE_COMMANDS=TRUE -DFBGEMM_BUILD_TARGET=genai -DFBGEMM_BUILD_VARIANT=cuda -DNVML_LIB_PATH=/github/home/miniconda/envs/build_binary/lib/stubs/libnvidia-ml.so -DNCCL_INCLUDE_DIRS=/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -DNCCL_LIBRARIES=/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib/libnccl.so.2 '-DCMAKE_C_FLAGS='"'"'-DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib'"'"'' '-DCMAKE_CXX_FLAGS='"'"'-DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib'"'"'' '-DTORCH_CUDA_ARCH_LIST=7.0;8.0;9.0;9.0a;10.0a;12.0a' -DCUDA_TOOLKIT_ROOT_DIR=/github/home/miniconda/envs/build_binary/targets/x86_64-linux -DCUDAToolkit_ROOT=/github/home/miniconda/envs/build_binary/targets/x86_64-linux -DCMAKE_CXX_STANDARD=20 '-DTORCH_CUDA_ARCH_LIST=7.0;8.0;9.0;9.0a;10.0a;12.0a' -DCUDA_TOOLKIT_ROOT_DIR=/github/home/miniconda/envs/build_binary/targets/x86_64-linux -DCUDAToolkit_ROOT=/github/home/miniconda/envs/build_binary/targets/x86_64-linux -DCMAKE_CXX_STANDARD=20 -DCMAKE_AR=/github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-ar -DCMAKE_CXX_COMPILER_AR=/github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-gcc-ar -DCMAKE_C_COMPILER_AR=/github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-gcc-ar -DCMAKE_RANLIB=/github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-ranlib -DCMAKE_CXX_COMPILER_RANLIB=/github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-gcc-ranlib -DCMAKE_C_COMPILER_RANLIB=/github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-gcc-ranlib -DCMAKE_LINKER=/github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-ld -DCMAKE_STRIP=/github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-strip -DCMAKE_BUILD_TYPE=Release 2025-05-07T19:50:41.5482512Z 2025-05-07T19:50:41.5902830Z Not searching for unused variables given on the command line. 2025-05-07T19:50:41.5903248Z 2025-05-07T19:50:41.5903253Z 2025-05-07T19:50:41.5903409Z ================================================================================ 2025-05-07T19:50:41.5903770Z Default C compiler flags 2025-05-07T19:50:41.5904185Z (values may be overridden by CMAKE_CXX_STANDARD and CXX_STANDARD): 2025-05-07T19:50:41.5904687Z 2025-05-07T19:50:41.5905176Z -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib 2025-05-07T19:50:41.5905906Z ================================================================================ 2025-05-07T19:50:41.5906156Z 2025-05-07T19:50:41.5906160Z 2025-05-07T19:50:41.5906163Z 2025-05-07T19:50:41.5906315Z ================================================================================ 2025-05-07T19:50:41.5906658Z Default C++ compiler flags 2025-05-07T19:50:41.5907041Z (values may be overridden by CMAKE_CXX_STANDARD and CXX_STANDARD): 2025-05-07T19:50:41.5907353Z 2025-05-07T19:50:41.5907809Z -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib 2025-05-07T19:50:41.5908498Z ================================================================================ 2025-05-07T19:50:41.5908729Z 2025-05-07T19:50:41.5908732Z 2025-05-07T19:50:41.5908736Z 2025-05-07T19:50:41.5908865Z ================================================================================ 2025-05-07T19:50:41.5909179Z AVX2_FLAGS: 2025-05-07T19:50:41.5909317Z 2025-05-07T19:50:41.5909397Z -mavx2 2025-05-07T19:50:41.5909586Z -mf16c 2025-05-07T19:50:41.5909793Z -mfma 2025-05-07T19:50:41.5910108Z -fopenmp 2025-05-07T19:50:41.5910370Z ================================================================================ 2025-05-07T19:50:41.5910603Z 2025-05-07T19:50:41.5910607Z 2025-05-07T19:50:41.5910610Z 2025-05-07T19:50:41.5910752Z ================================================================================ 2025-05-07T19:50:41.5911166Z AVX512_FLAGS: 2025-05-07T19:50:41.5911330Z 2025-05-07T19:50:41.5911418Z -mavx2 2025-05-07T19:50:41.5911622Z -mf16c 2025-05-07T19:50:41.5911845Z -mfma 2025-05-07T19:50:41.5912051Z -mavx512f 2025-05-07T19:50:41.5912286Z -mavx512bw 2025-05-07T19:50:41.5912523Z -mavx512dq 2025-05-07T19:50:41.5912729Z -mavx512vl 2025-05-07T19:50:41.5912962Z -fopenmp 2025-05-07T19:50:41.5913199Z ================================================================================ 2025-05-07T19:50:41.5913430Z 2025-05-07T19:50:41.5913434Z 2025-05-07T19:50:41.5913460Z 2025-05-07T19:50:41.5913580Z ================================================================================ 2025-05-07T19:50:41.5913935Z The project is built using scikit-build 2025-05-07T19:50:41.5914297Z ================================================================================ 2025-05-07T19:50:41.5914527Z 2025-05-07T19:50:41.5914530Z 2025-05-07T19:50:41.5914534Z 2025-05-07T19:50:41.5914676Z ================================================================================ 2025-05-07T19:50:41.5915003Z Build Settings 2025-05-07T19:50:41.5915165Z 2025-05-07T19:50:41.5915290Z FBGEMM_BUILD_TARGET : genai 2025-05-07T19:50:41.5915584Z FBGEMM_BUILD_VARIANT : cuda 2025-05-07T19:50:41.5915795Z 2025-05-07T19:50:41.5915896Z NVCC_VERBOSE : 2025-05-07T19:50:41.5916184Z CUDNN_INCLUDE_DIR : 2025-05-07T19:50:41.5916454Z CUDNN_LIBRARY : 2025-05-07T19:50:41.5916918Z NVML_LIB_PATH : /github/home/miniconda/envs/build_binary/lib/stubs/libnvidia-ml.so 2025-05-07T19:50:41.5917409Z TORCH_CUDA_ARCH_LIST : 7.0 2025-05-07T19:50:41.5917693Z 8.0 2025-05-07T19:50:41.5917890Z 9.0 2025-05-07T19:50:41.5918107Z 9.0a 2025-05-07T19:50:41.5918302Z 10.0a 2025-05-07T19:50:41.5918524Z 12.0a 2025-05-07T19:50:41.5918640Z 2025-05-07T19:50:41.5918740Z HIP_ROOT_DIR : 2025-05-07T19:50:41.5919028Z HIPCC_VERBOSE : 2025-05-07T19:50:41.5919318Z AMDGPU_TARGETS : 2025-05-07T19:50:41.5919585Z PYTORCH_ROCM_ARCH : 2025-05-07T19:50:41.5919907Z ================================================================================ 2025-05-07T19:50:41.5920146Z 2025-05-07T19:50:41.6701837Z -- The CXX compiler identification is GNU 11.4.0 2025-05-07T19:50:41.7112043Z -- The C compiler identification is GNU 11.4.0 2025-05-07T19:50:42.6463224Z -- The CUDA compiler identification is NVIDIA 12.8.61 with host compiler GNU 11.4.0 2025-05-07T19:50:42.6558734Z -- Detecting CXX compiler ABI info 2025-05-07T19:50:42.7503741Z -- Detecting CXX compiler ABI info - done 2025-05-07T19:50:42.7707860Z -- Check for working CXX compiler: /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ - skipped 2025-05-07T19:50:42.7709810Z -- Detecting CXX compile features 2025-05-07T19:50:42.7715915Z -- Detecting CXX compile features - done 2025-05-07T19:50:42.7837618Z -- Detecting C compiler ABI info 2025-05-07T19:50:42.8713309Z -- Detecting C compiler ABI info - done 2025-05-07T19:50:42.8902885Z -- Check for working C compiler: /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-cc - skipped 2025-05-07T19:50:42.8906234Z -- Detecting C compile features 2025-05-07T19:50:42.8910965Z -- Detecting C compile features - done 2025-05-07T19:50:42.9011324Z -- Detecting CUDA compiler ABI info 2025-05-07T19:50:43.8197512Z -- Detecting CUDA compiler ABI info - done 2025-05-07T19:50:43.8775123Z -- Check for working CUDA compiler: /github/home/miniconda/envs/build_binary/bin/nvcc - skipped 2025-05-07T19:50:43.8797692Z -- Detecting CUDA compile features 2025-05-07T19:50:43.8798932Z -- Detecting CUDA compile features - done 2025-05-07T19:50:43.8875731Z -- Performing Test C_HAS_AVX_1 2025-05-07T19:50:44.1423753Z -- Performing Test C_HAS_AVX_1 - Failed 2025-05-07T19:50:44.1424791Z -- Performing Test C_HAS_AVX_2 2025-05-07T19:50:44.4140542Z -- Performing Test C_HAS_AVX_2 - Success 2025-05-07T19:50:44.4141568Z -- Performing Test C_HAS_AVX2_1 2025-05-07T19:50:44.6690445Z -- Performing Test C_HAS_AVX2_1 - Failed 2025-05-07T19:50:44.6691516Z -- Performing Test C_HAS_AVX2_2 2025-05-07T19:50:44.9369479Z -- Performing Test C_HAS_AVX2_2 - Success 2025-05-07T19:50:44.9369912Z -- Performing Test C_HAS_AVX512_1 2025-05-07T19:50:45.1932574Z -- Performing Test C_HAS_AVX512_1 - Failed 2025-05-07T19:50:45.1933660Z -- Performing Test C_HAS_AVX512_2 2025-05-07T19:50:45.4114151Z -- Performing Test C_HAS_AVX512_2 - Success 2025-05-07T19:50:45.4115247Z -- Performing Test CXX_HAS_AVX_1 2025-05-07T19:50:45.6671322Z -- Performing Test CXX_HAS_AVX_1 - Failed 2025-05-07T19:50:45.6671721Z -- Performing Test CXX_HAS_AVX_2 2025-05-07T19:50:45.9401956Z -- Performing Test CXX_HAS_AVX_2 - Success 2025-05-07T19:50:45.9403148Z -- Performing Test CXX_HAS_AVX2_1 2025-05-07T19:50:46.1955405Z -- Performing Test CXX_HAS_AVX2_1 - Failed 2025-05-07T19:50:46.1956468Z -- Performing Test CXX_HAS_AVX2_2 2025-05-07T19:50:46.4641337Z -- Performing Test CXX_HAS_AVX2_2 - Success 2025-05-07T19:50:46.4641856Z -- Performing Test CXX_HAS_AVX512_1 2025-05-07T19:50:46.7210789Z -- Performing Test CXX_HAS_AVX512_1 - Failed 2025-05-07T19:50:46.7211895Z -- Performing Test CXX_HAS_AVX512_2 2025-05-07T19:50:46.9366808Z -- Performing Test CXX_HAS_AVX512_2 - Success 2025-05-07T19:50:46.9551896Z -- Found CUDA: /github/home/miniconda/envs/build_binary/targets/x86_64-linux (found version "12.8") 2025-05-07T19:50:46.9587477Z -- Found CUDAToolkit: /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include (found version "12.8.61") 2025-05-07T19:50:46.9666809Z -- Performing Test CMAKE_HAVE_LIBC_PTHREAD 2025-05-07T19:50:47.0567239Z -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed 2025-05-07T19:50:47.0567918Z -- Looking for pthread_create in pthreads 2025-05-07T19:50:47.1359033Z -- Looking for pthread_create in pthreads - not found 2025-05-07T19:50:47.1360226Z -- Looking for pthread_create in pthread 2025-05-07T19:50:47.2250421Z -- Looking for pthread_create in pthread - found 2025-05-07T19:50:47.2259659Z -- Found Threads: TRUE 2025-05-07T19:50:47.3866649Z -- PyTorch: CUDA detected: 12.8 2025-05-07T19:50:47.3868270Z -- PyTorch: CUDA nvcc is: /github/home/miniconda/envs/build_binary/targets/x86_64-linux/bin/nvcc 2025-05-07T19:50:47.3870084Z -- PyTorch: CUDA toolkit directory: /github/home/miniconda/envs/build_binary/targets/x86_64-linux 2025-05-07T19:50:47.5089902Z -- PyTorch: Header version is: 12.8 2025-05-07T19:50:47.6064807Z -- Found Python: /github/home/miniconda/envs/build_binary/bin/python (found version "3.13.2") found components: Interpreter 2025-05-07T19:50:47.6078366Z CMake Warning at /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/share/cmake/Caffe2/public/cuda.cmake:140 (message): 2025-05-07T19:50:47.6079250Z -- USE_CUDNN is set to 0. Compiling without cuDNN support 2025-05-07T19:50:47.6079709Z Failed to compute shorthash for libnvrtc.so 2025-05-07T19:50:47.6080063Z Call Stack (most recent call first): 2025-05-07T19:50:47.6080811Z /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/share/cmake/Caffe2/Caffe2Config.cmake:86 (include) 2025-05-07T19:50:47.6081975Z /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:68 (find_package) 2025-05-07T19:50:47.6082869Z /__w/FBGEMM/FBGEMM/cmake/modules/PyTorchSetup.cmake:14 (find_package) 2025-05-07T19:50:47.6083348Z CMakeLists.txt:112 (include) 2025-05-07T19:50:47.6083543Z 2025-05-07T19:50:47.6083547Z 2025-05-07T19:50:47.6083766Z -- USE_CUSPARSELT is set to 0. Compiling without cuSPARSELt support 2025-05-07T19:50:47.6084267Z -- USE_CUDSS is set to 0. Compiling without cuDSS support 2025-05-07T19:50:47.6084700Z -- USE_CUFILE is set to 0. Compiling without cuFile support 2025-05-07T19:50:47.6085879Z -- Added CUDA NVCC flags for: -gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_90,code=sm_90;-gencode;arch=compute_90a,code=sm_90a;-gencode;arch=compute_100a,code=sm_100a;-gencode;arch=compute_120a,code=sm_120a 2025-05-07T19:50:47.6439432Z CMake Warning at /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:22 (message): 2025-05-07T19:50:47.6442424Z static library kineto_LIBRARY-NOTFOUND not found. 2025-05-07T19:50:47.6443525Z Call Stack (most recent call first): 2025-05-07T19:50:47.6445971Z /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:125 (append_torchlib_if_found) 2025-05-07T19:50:47.6446932Z /__w/FBGEMM/FBGEMM/cmake/modules/PyTorchSetup.cmake:14 (find_package) 2025-05-07T19:50:47.6447414Z CMakeLists.txt:112 (include) 2025-05-07T19:50:47.6447609Z 2025-05-07T19:50:47.6447613Z 2025-05-07T19:50:47.6448006Z -- Found Torch: /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/lib/libtorch.so 2025-05-07T19:50:47.6448564Z 2025-05-07T19:50:47.6448567Z 2025-05-07T19:50:47.6448692Z ================================================================================ 2025-05-07T19:50:47.6449042Z PyTorch Flags: 2025-05-07T19:50:47.6449263Z 2025-05-07T19:50:47.6449479Z TORCH_INCLUDE_DIRS: 2025-05-07T19:50:47.6449925Z /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include 2025-05-07T19:50:47.6450759Z /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include 2025-05-07T19:50:47.6451378Z 2025-05-07T19:50:47.6451579Z TORCH_LIBRARIES: 2025-05-07T19:50:47.6451823Z torch 2025-05-07T19:50:47.6452026Z torch_library 2025-05-07T19:50:47.6452499Z /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/lib/libc10.so 2025-05-07T19:50:47.6453310Z /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/libnvrtc.so 2025-05-07T19:50:47.6454028Z /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/lib/libc10_cuda.so 2025-05-07T19:50:47.6454551Z 2025-05-07T19:50:47.6454769Z TORCH_CUDA_OPTIONS: 2025-05-07T19:50:47.6455036Z --expt-relaxed-constexpr 2025-05-07T19:50:47.6455312Z -D__CUDA_NO_HALF_OPERATORS__ 2025-05-07T19:50:47.6455618Z -D__CUDA_NO_BFLOAT16_CONVERSIONS__ 2025-05-07T19:50:47.6455925Z -D__CUDA_NO_HALF2_OPERATORS__ 2025-05-07T19:50:47.6456240Z ================================================================================ 2025-05-07T19:50:47.6456480Z 2025-05-07T19:50:47.6456484Z 2025-05-07T19:50:47.6456487Z 2025-05-07T19:50:47.6456603Z ================================================================================ 2025-05-07T19:50:47.6457085Z NCCL Flags 2025-05-07T19:50:47.6457214Z 2025-05-07T19:50:47.6457624Z NCCL_INCLUDE_DIRS=/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include 2025-05-07T19:50:47.6458538Z NCCL_LIBRARIES=/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib/libnccl.so.2 2025-05-07T19:50:47.6459210Z ================================================================================ 2025-05-07T19:50:47.6459570Z 2025-05-07T19:50:47.6459574Z 2025-05-07T19:50:47.6459578Z 2025-05-07T19:50:47.6459714Z ================================================================================ 2025-05-07T19:50:47.6460226Z CUDA Driver Path 2025-05-07T19:50:47.6460370Z 2025-05-07T19:50:47.6460762Z CUDA_DRIVER_LIBRARIES=/github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/stubs/libcuda.so 2025-05-07T19:50:47.6461366Z ================================================================================ 2025-05-07T19:50:47.6461619Z 2025-05-07T19:50:47.6461923Z -- Found NVML_LIB_PATH: /github/home/miniconda/envs/build_binary/lib/stubs/libnvidia-ml.so 2025-05-07T19:50:47.6473256Z 2025-05-07T19:50:47.6473350Z 2025-05-07T19:50:47.6473886Z ================================================================================ 2025-05-07T19:50:47.6475109Z GPU CPP Library Target: asmjit (SHARED) 2025-05-07T19:50:47.6475996Z 2025-05-07T19:50:47.6476520Z CPU_SRCS: 2025-05-07T19:50:47.6476877Z 2025-05-07T19:50:47.6477088Z 2025-05-07T19:50:47.6477626Z GPU_SRCS: 2025-05-07T19:50:47.6477950Z 2025-05-07T19:50:47.6478159Z 2025-05-07T19:50:47.6478715Z CUDA_SPECIFIC_SRCS: 2025-05-07T19:50:47.6479134Z 2025-05-07T19:50:47.6479680Z 2025-05-07T19:50:47.6480072Z HIP_SPECIFIC_SRCS: 2025-05-07T19:50:47.6480220Z 2025-05-07T19:50:47.6480302Z 2025-05-07T19:50:47.6480513Z OTHER_SRCS: 2025-05-07T19:50:47.6480909Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/arm/a64assembler.cpp 2025-05-07T19:50:47.6481564Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/arm/a64builder.cpp 2025-05-07T19:50:47.6482206Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/arm/a64compiler.cpp 2025-05-07T19:50:47.6482839Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/arm/a64emithelper.cpp 2025-05-07T19:50:47.6483488Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/arm/a64formatter.cpp 2025-05-07T19:50:47.6484096Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/arm/a64func.cpp 2025-05-07T19:50:47.6484712Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/arm/a64instapi.cpp 2025-05-07T19:50:47.6485317Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/arm/a64instdb.cpp 2025-05-07T19:50:47.6485951Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/arm/a64operand.cpp 2025-05-07T19:50:47.6486575Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/arm/a64rapass.cpp 2025-05-07T19:50:47.6487206Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/arm/armformatter.cpp 2025-05-07T19:50:47.6487855Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/archtraits.cpp 2025-05-07T19:50:47.6488470Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/assembler.cpp 2025-05-07T19:50:47.6489111Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/builder.cpp 2025-05-07T19:50:47.6489737Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/codeholder.cpp 2025-05-07T19:50:47.6490357Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/codewriter.cpp 2025-05-07T19:50:47.6490989Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/compiler.cpp 2025-05-07T19:50:47.6491599Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/constpool.cpp 2025-05-07T19:50:47.6492227Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/cpuinfo.cpp 2025-05-07T19:50:47.6492837Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/emithelper.cpp 2025-05-07T19:50:47.6493584Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/emitter.cpp 2025-05-07T19:50:47.6494228Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/emitterutils.cpp 2025-05-07T19:50:47.6494866Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/environment.cpp 2025-05-07T19:50:47.6495517Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/errorhandler.cpp 2025-05-07T19:50:47.6496145Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/formatter.cpp 2025-05-07T19:50:47.6496760Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/func.cpp 2025-05-07T19:50:47.6497377Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/funcargscontext.cpp 2025-05-07T19:50:47.6498032Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/globals.cpp 2025-05-07T19:50:47.6498631Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/inst.cpp 2025-05-07T19:50:47.6499210Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/instdb.cpp 2025-05-07T19:50:47.6500012Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/jitallocator.cpp 2025-05-07T19:50:47.6501027Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/jitruntime.cpp 2025-05-07T19:50:47.6501648Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/logger.cpp 2025-05-07T19:50:47.6502259Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/operand.cpp 2025-05-07T19:50:47.6502844Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/osutils.cpp 2025-05-07T19:50:47.6503457Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/ralocal.cpp 2025-05-07T19:50:47.6504037Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/rapass.cpp 2025-05-07T19:50:47.6504831Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/rastack.cpp 2025-05-07T19:50:47.6505416Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/string.cpp 2025-05-07T19:50:47.6506026Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/support.cpp 2025-05-07T19:50:47.6506633Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/target.cpp 2025-05-07T19:50:47.6507209Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/type.cpp 2025-05-07T19:50:47.6507811Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/virtmem.cpp 2025-05-07T19:50:47.6508390Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/zone.cpp 2025-05-07T19:50:47.6509061Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/zonehash.cpp 2025-05-07T19:50:47.6509685Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/zonelist.cpp 2025-05-07T19:50:47.6510315Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/zonestack.cpp 2025-05-07T19:50:47.6510927Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/zonetree.cpp 2025-05-07T19:50:47.6511557Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/zonevector.cpp 2025-05-07T19:50:47.6512183Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/x86/x86assembler.cpp 2025-05-07T19:50:47.6512922Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/x86/x86builder.cpp 2025-05-07T19:50:47.6513539Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/x86/x86compiler.cpp 2025-05-07T19:50:47.6514148Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/x86/x86emithelper.cpp 2025-05-07T19:50:47.6514780Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/x86/x86formatter.cpp 2025-05-07T19:50:47.6515365Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/x86/x86func.cpp 2025-05-07T19:50:47.6515955Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/x86/x86instapi.cpp 2025-05-07T19:50:47.6516539Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/x86/x86instdb.cpp 2025-05-07T19:50:47.6517140Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/x86/x86operand.cpp 2025-05-07T19:50:47.6517842Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/x86/x86rapass.cpp 2025-05-07T19:50:47.6518274Z 2025-05-07T19:50:47.6518488Z CC_FLAGS: 2025-05-07T19:50:47.6518611Z 2025-05-07T19:50:47.6518693Z 2025-05-07T19:50:47.6518905Z NVCC_FLAGS: 2025-05-07T19:50:47.6519028Z 2025-05-07T19:50:47.6519109Z 2025-05-07T19:50:47.6519322Z HIPCC_FLAGS: 2025-05-07T19:50:47.6519449Z 2025-05-07T19:50:47.6519530Z 2025-05-07T19:50:47.6519733Z INCLUDE_DIRS: 2025-05-07T19:50:47.6519972Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../include 2025-05-07T19:50:47.6520315Z /__w/FBGEMM/FBGEMM/fbgemm_gpu 2025-05-07T19:50:47.6520626Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/include 2025-05-07T19:50:47.6520941Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../include 2025-05-07T19:50:47.6521468Z /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include 2025-05-07T19:50:47.6522262Z /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include 2025-05-07T19:50:47.6522938Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src 2025-05-07T19:50:47.6523362Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include 2025-05-07T19:50:47.6523818Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include 2025-05-07T19:50:47.6524313Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include 2025-05-07T19:50:47.6524832Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include 2025-05-07T19:50:47.6525316Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include 2025-05-07T19:50:47.6525879Z /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include 2025-05-07T19:50:47.6526407Z 2025-05-07T19:50:47.6526612Z Selected Source Files: 2025-05-07T19:50:47.6526789Z 2025-05-07T19:50:47.6526940Z 2025-05-07T19:50:47.6527144Z HIPified Source Files: 2025-05-07T19:50:47.6527318Z 2025-05-07T19:50:47.6527396Z 2025-05-07T19:50:47.6527617Z Library Dependencies: 2025-05-07T19:50:47.6527849Z torch 2025-05-07T19:50:47.6528066Z torch_library 2025-05-07T19:50:47.6528509Z /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/lib/libc10.so 2025-05-07T19:50:47.6529205Z /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/libnvrtc.so 2025-05-07T19:50:47.6529898Z /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/lib/libc10_cuda.so 2025-05-07T19:50:47.6530708Z /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib/libnccl.so.2 2025-05-07T19:50:47.6531459Z /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/stubs/libcuda.so 2025-05-07T19:50:47.6532061Z /github/home/miniconda/envs/build_binary/lib/stubs/libnvidia-ml.so 2025-05-07T19:50:47.6532483Z 2025-05-07T19:50:47.6532680Z Output Library: 2025-05-07T19:50:47.6532919Z asmjit 2025-05-07T19:50:47.6533114Z 2025-05-07T19:50:47.6533337Z Destination Directory: 2025-05-07T19:50:47.6533582Z fbgemm_gpu 2025-05-07T19:50:47.6533837Z ================================================================================ 2025-05-07T19:50:47.6534070Z 2025-05-07T19:50:47.6534075Z 2025-05-07T19:50:47.6534079Z 2025-05-07T19:50:47.6534218Z ================================================================================ 2025-05-07T19:50:47.6534563Z GPU CPP Library Target: fbgemm (SHARED) 2025-05-07T19:50:47.6534876Z 2025-05-07T19:50:47.6535061Z CPU_SRCS: 2025-05-07T19:50:47.6535190Z 2025-05-07T19:50:47.6535271Z 2025-05-07T19:50:47.6535463Z GPU_SRCS: 2025-05-07T19:50:47.6535597Z 2025-05-07T19:50:47.6535677Z 2025-05-07T19:50:47.6535889Z CUDA_SPECIFIC_SRCS: 2025-05-07T19:50:47.6536033Z 2025-05-07T19:50:47.6536112Z 2025-05-07T19:50:47.6536324Z HIP_SPECIFIC_SRCS: 2025-05-07T19:50:47.6536467Z 2025-05-07T19:50:47.6536547Z 2025-05-07T19:50:47.6536750Z OTHER_SRCS: 2025-05-07T19:50:47.6537027Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../src/EmbeddingSpMDM.cc 2025-05-07T19:50:47.6537495Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../src/EmbeddingSpMDMAutovec.cc 2025-05-07T19:50:47.6537963Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../src/EmbeddingSpMDMNBit.cc 2025-05-07T19:50:47.6538400Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../src/QuantUtils.cc 2025-05-07T19:50:47.6538920Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../src/RefImplementations.cc 2025-05-07T19:50:47.6539517Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../src/RowWiseSparseAdagradFused.cc 2025-05-07T19:50:47.6540196Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../src/SparseAdagrad.cc 2025-05-07T19:50:47.6540609Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../src/Utils.cc 2025-05-07T19:50:47.6541034Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../src/EmbeddingSpMDMAvx2.cc 2025-05-07T19:50:47.6541469Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../src/QuantUtilsAvx2.cc 2025-05-07T19:50:47.6541925Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../src/EmbeddingSpMDMAvx2.cc 2025-05-07T19:50:47.6542360Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../src/QuantUtilsAvx2.cc 2025-05-07T19:50:47.6542828Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../src/EmbeddingSpMDMAvx512.cc 2025-05-07T19:50:47.6543224Z 2025-05-07T19:50:47.6543417Z CC_FLAGS: 2025-05-07T19:50:47.6543539Z 2025-05-07T19:50:47.6543638Z 2025-05-07T19:50:47.6543831Z NVCC_FLAGS: 2025-05-07T19:50:47.6543975Z 2025-05-07T19:50:47.6544057Z 2025-05-07T19:50:47.6544257Z HIPCC_FLAGS: 2025-05-07T19:50:47.6544408Z 2025-05-07T19:50:47.6544490Z 2025-05-07T19:50:47.6544679Z INCLUDE_DIRS: 2025-05-07T19:50:47.6544937Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../include 2025-05-07T19:50:47.6545261Z /__w/FBGEMM/FBGEMM/fbgemm_gpu 2025-05-07T19:50:47.6545570Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/include 2025-05-07T19:50:47.6545907Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../include 2025-05-07T19:50:47.6546413Z /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include 2025-05-07T19:50:47.6547237Z /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include 2025-05-07T19:50:47.6547905Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src 2025-05-07T19:50:47.6548513Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include 2025-05-07T19:50:47.6548953Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include 2025-05-07T19:50:47.6549458Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include 2025-05-07T19:50:47.6550013Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include 2025-05-07T19:50:47.6550483Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include 2025-05-07T19:50:47.6551078Z /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include 2025-05-07T19:50:47.6551595Z 2025-05-07T19:50:47.6551828Z Selected Source Files: 2025-05-07T19:50:47.6551989Z 2025-05-07T19:50:47.6552196Z 2025-05-07T19:50:47.6552412Z HIPified Source Files: 2025-05-07T19:50:47.6552565Z 2025-05-07T19:50:47.6552664Z 2025-05-07T19:50:47.6552854Z Library Dependencies: 2025-05-07T19:50:47.6553097Z torch 2025-05-07T19:50:47.6553280Z torch_library 2025-05-07T19:50:47.6553716Z /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/lib/libc10.so 2025-05-07T19:50:47.6554359Z /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/libnvrtc.so 2025-05-07T19:50:47.6555043Z /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/lib/libc10_cuda.so 2025-05-07T19:50:47.6555805Z /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib/libnccl.so.2 2025-05-07T19:50:47.6556525Z /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/stubs/libcuda.so 2025-05-07T19:50:47.6556989Z asmjit 2025-05-07T19:50:47.6557305Z /github/home/miniconda/envs/build_binary/lib/stubs/libnvidia-ml.so 2025-05-07T19:50:47.6557702Z 2025-05-07T19:50:47.6557893Z Output Library: 2025-05-07T19:50:47.6558117Z fbgemm 2025-05-07T19:50:47.6558296Z 2025-05-07T19:50:47.6558509Z Destination Directory: 2025-05-07T19:50:47.6558741Z fbgemm_gpu 2025-05-07T19:50:47.6558985Z ================================================================================ 2025-05-07T19:50:47.6559206Z 2025-05-07T19:50:47.6559210Z 2025-05-07T19:50:47.6559213Z 2025-05-07T19:50:47.6559350Z ================================================================================ 2025-05-07T19:50:47.6559676Z Running code generation script ... 2025-05-07T19:50:47.6560478Z /github/home/miniconda/envs/build_binary/bin/python /__w/FBGEMM/FBGEMM/fbgemm_gpu/codegen/genscript/generate_backward_split.py --opensource 2025-05-07T19:50:47.6561218Z ================================================================================ 2025-05-07T19:50:47.6561459Z 2025-05-07T19:50:48.1836458Z [ARGS PARSE] Parsed arguments: Namespace(install_dir='.', is_fbcode=False, is_rocm=False) 2025-05-07T19:50:48.1837488Z [GENERAATE BACKWARD SPLIT]: ['/__w/FBGEMM/FBGEMM/fbgemm_gpu/codegen/genscript/generate_backward_split.py', '--opensource'] 2025-05-07T19:50:48.1838289Z Written: gen_embedding_backward_dense_split_weighted_vbe_cuda.cu 2025-05-07T19:50:48.1838786Z Written: gen_embedding_backward_dense_split_weighted_cuda.cu 2025-05-07T19:50:48.1839332Z Written: gen_embedding_backward_dense_split_unweighted_nobag_cuda.cu 2025-05-07T19:50:48.1839866Z Written: gen_embedding_backward_dense_split_unweighted_vbe_cuda.cu 2025-05-07T19:50:48.1840397Z Written: gen_embedding_backward_dense_split_unweighted_cuda.cu 2025-05-07T19:50:48.1841021Z Written: gen_embedding_backward_dense_split_weighted_vbe_meta.cpp 2025-05-07T19:50:48.1841636Z Written: gen_embedding_backward_dense_split_weighted_meta.cpp 2025-05-07T19:50:48.1842138Z Written: gen_embedding_backward_dense_split_unweighted_nobag_meta.cpp 2025-05-07T19:50:48.1842645Z Written: gen_embedding_backward_dense_split_unweighted_vbe_meta.cpp 2025-05-07T19:50:48.1843146Z Written: gen_embedding_backward_dense_split_unweighted_meta.cpp 2025-05-07T19:50:48.1843639Z Written: gen_embedding_backward_dense_split_weighted_vbe_kernel_cta.cu 2025-05-07T19:50:48.1844157Z Written: gen_embedding_backward_dense_split_weighted_kernel_cta.cu 2025-05-07T19:50:48.1844674Z Written: gen_embedding_backward_dense_split_unweighted_nobag_kernel_cta.cu 2025-05-07T19:50:48.1845497Z Written: gen_embedding_backward_dense_split_unweighted_vbe_kernel_cta.cu 2025-05-07T19:50:48.1846042Z Written: gen_embedding_backward_dense_split_unweighted_kernel_cta.cu 2025-05-07T19:50:48.1846558Z Written: gen_embedding_backward_dense_split_weighted_vbe_kernel_warp.cu 2025-05-07T19:50:48.1847105Z Written: gen_embedding_backward_dense_split_weighted_kernel_warp.cu 2025-05-07T19:50:48.1847632Z Written: gen_embedding_backward_dense_split_unweighted_nobag_kernel_warp.cu 2025-05-07T19:50:48.1848198Z Written: gen_embedding_backward_dense_split_unweighted_vbe_kernel_warp.cu 2025-05-07T19:50:48.1848726Z Written: gen_embedding_backward_dense_split_unweighted_kernel_warp.cu 2025-05-07T19:50:48.1849229Z Written: gen_embedding_optimizer_dense_split_device_kernel.cuh 2025-05-07T19:50:48.1849662Z Written: gen_embedding_backward_split_dense.cpp 2025-05-07T19:50:48.1850030Z Written: gen_embedding_backward_dense_split_cpu.cpp 2025-05-07T19:50:48.1850475Z Written: gen_embedding_backward_adagrad_split_weighted_cuda.cu 2025-05-07T19:50:48.1850963Z Written: gen_embedding_backward_adagrad_split_unweighted_nobag_cuda.cu 2025-05-07T19:50:48.1851475Z Written: gen_embedding_backward_adagrad_split_unweighted_cuda.cu 2025-05-07T19:50:48.1851951Z Written: gen_embedding_backward_adagrad_split_weighted_meta.cpp 2025-05-07T19:50:48.1852460Z Written: gen_embedding_backward_adagrad_split_unweighted_nobag_meta.cpp 2025-05-07T19:50:48.1852985Z Written: gen_embedding_backward_adagrad_split_unweighted_meta.cpp 2025-05-07T19:50:48.1853477Z Written: gen_embedding_backward_adagrad_split_weighted_kernel_cta.cu 2025-05-07T19:50:48.1854026Z Written: gen_embedding_backward_adagrad_split_unweighted_nobag_kernel_cta.cu 2025-05-07T19:50:48.1854570Z Written: gen_embedding_backward_adagrad_split_unweighted_kernel_cta.cu 2025-05-07T19:50:48.1855107Z Written: gen_embedding_backward_adagrad_split_weighted_kernel_warp.cu 2025-05-07T19:50:48.1855646Z Written: gen_embedding_backward_adagrad_split_unweighted_nobag_kernel_warp.cu 2025-05-07T19:50:48.1856216Z Written: gen_embedding_backward_adagrad_split_unweighted_kernel_warp.cu 2025-05-07T19:50:48.1856733Z Written: gen_embedding_optimizer_adagrad_split_device_kernel.cuh 2025-05-07T19:50:48.1857155Z Written: gen_embedding_backward_split_adagrad.cpp 2025-05-07T19:50:48.1857683Z Written: gen_embedding_split_adagrad_pt2_autograd.cpp 2025-05-07T19:50:48.1858131Z Written: gen_embedding_backward_split_adagrad_pt2_cuda_wrapper.cpp 2025-05-07T19:50:48.1858545Z Written: lookup_adagrad.py 2025-05-07T19:50:48.1858855Z Written: gen_embedding_backward_adagrad_split_cpu.cpp 2025-05-07T19:50:48.1859265Z Written: gen_embedding_backward_split_adagrad_cpu.cpp 2025-05-07T19:50:48.1860038Z Written: gen_embedding_backward_split_adagrad_pt2_cpu_wrapper.cpp 2025-05-07T19:50:48.1860543Z Written: gen_embedding_backward_adam_split_weighted_vbe_cuda.cu 2025-05-07T19:50:48.1861050Z Written: gen_embedding_backward_adam_split_weighted_cuda.cu 2025-05-07T19:50:48.1861552Z Written: gen_embedding_backward_adam_split_unweighted_nobag_cuda.cu 2025-05-07T19:50:48.1862092Z Written: gen_embedding_backward_adam_split_unweighted_vbe_cuda.cu 2025-05-07T19:50:48.1862591Z Written: gen_embedding_backward_adam_split_unweighted_cuda.cu 2025-05-07T19:50:48.1863102Z Written: gen_embedding_backward_adam_split_weighted_vbe_meta.cpp 2025-05-07T19:50:48.1863616Z Written: gen_embedding_backward_adam_split_weighted_meta.cpp 2025-05-07T19:50:48.1864124Z Written: gen_embedding_backward_adam_split_unweighted_nobag_meta.cpp 2025-05-07T19:50:48.1864674Z Written: gen_embedding_backward_adam_split_unweighted_vbe_meta.cpp 2025-05-07T19:50:48.1865181Z Written: gen_embedding_backward_adam_split_unweighted_meta.cpp 2025-05-07T19:50:48.1865722Z Written: gen_embedding_backward_adam_split_weighted_vbe_kernel_cta.cu 2025-05-07T19:50:48.1866358Z Written: gen_embedding_backward_adam_split_weighted_kernel_cta.cu 2025-05-07T19:50:48.1866890Z Written: gen_embedding_backward_adam_split_unweighted_nobag_kernel_cta.cu 2025-05-07T19:50:48.1867565Z Written: gen_embedding_backward_adam_split_unweighted_vbe_kernel_cta.cu 2025-05-07T19:50:48.1868076Z Written: gen_embedding_backward_adam_split_unweighted_kernel_cta.cu 2025-05-07T19:50:48.1868601Z Written: gen_embedding_backward_adam_split_weighted_vbe_kernel_warp.cu 2025-05-07T19:50:48.1869105Z Written: gen_embedding_backward_adam_split_weighted_kernel_warp.cu 2025-05-07T19:50:48.1869639Z Written: gen_embedding_backward_adam_split_unweighted_nobag_kernel_warp.cu 2025-05-07T19:50:48.1870178Z Written: gen_embedding_backward_adam_split_unweighted_vbe_kernel_warp.cu 2025-05-07T19:50:48.1870719Z Written: gen_embedding_backward_adam_split_unweighted_kernel_warp.cu 2025-05-07T19:50:48.1871218Z Written: gen_embedding_optimizer_adam_split_device_kernel.cuh 2025-05-07T19:50:48.1871628Z Written: gen_embedding_backward_split_adam.cpp 2025-05-07T19:50:48.1872036Z Written: gen_embedding_split_adam_pt2_autograd.cpp 2025-05-07T19:50:48.1872455Z Written: gen_embedding_backward_split_adam_pt2_cuda_wrapper.cpp 2025-05-07T19:50:48.1872853Z Written: lookup_adam.py 2025-05-07T19:50:48.1873144Z Written: gen_embedding_backward_split_adam_cpu.cpp 2025-05-07T19:50:48.1873582Z Written: gen_embedding_backward_split_adam_pt2_cpu_wrapper.cpp 2025-05-07T19:50:48.1874057Z Written: gen_embedding_backward_lamb_split_weighted_cuda.cu 2025-05-07T19:50:48.1874532Z Written: gen_embedding_backward_lamb_split_unweighted_nobag_cuda.cu 2025-05-07T19:50:48.1875019Z Written: gen_embedding_backward_lamb_split_unweighted_cuda.cu 2025-05-07T19:50:48.1875466Z Written: gen_embedding_backward_lamb_split_weighted_meta.cpp 2025-05-07T19:50:48.1875959Z Written: gen_embedding_backward_lamb_split_unweighted_nobag_meta.cpp 2025-05-07T19:50:48.1876436Z Written: gen_embedding_backward_lamb_split_unweighted_meta.cpp 2025-05-07T19:50:48.1876921Z Written: gen_embedding_backward_lamb_split_weighted_kernel_cta.cu 2025-05-07T19:50:48.1877447Z Written: gen_embedding_backward_lamb_split_unweighted_nobag_kernel_cta.cu 2025-05-07T19:50:48.1877970Z Written: gen_embedding_backward_lamb_split_unweighted_kernel_cta.cu 2025-05-07T19:50:48.1878469Z Written: gen_embedding_backward_lamb_split_weighted_kernel_warp.cu 2025-05-07T19:50:48.1878982Z Written: gen_embedding_backward_lamb_split_unweighted_nobag_kernel_warp.cu 2025-05-07T19:50:48.1879580Z Written: gen_embedding_backward_lamb_split_unweighted_kernel_warp.cu 2025-05-07T19:50:48.1880078Z Written: gen_embedding_optimizer_lamb_split_device_kernel.cuh 2025-05-07T19:50:48.1880499Z Written: gen_embedding_backward_split_lamb.cpp 2025-05-07T19:50:48.1880864Z Written: gen_embedding_split_lamb_pt2_autograd.cpp 2025-05-07T19:50:48.1881307Z Written: gen_embedding_backward_split_lamb_pt2_cuda_wrapper.cpp 2025-05-07T19:50:48.1881694Z Written: lookup_lamb.py 2025-05-07T19:50:48.1882007Z Written: gen_embedding_backward_split_lamb_cpu.cpp 2025-05-07T19:50:48.1882424Z Written: gen_embedding_backward_split_lamb_pt2_cpu_wrapper.cpp 2025-05-07T19:50:48.1882912Z Written: gen_embedding_backward_lars_sgd_split_weighted_cuda.cu 2025-05-07T19:50:48.1883415Z Written: gen_embedding_backward_lars_sgd_split_unweighted_nobag_cuda.cu 2025-05-07T19:50:48.1883944Z Written: gen_embedding_backward_lars_sgd_split_unweighted_cuda.cu 2025-05-07T19:50:48.1884437Z Written: gen_embedding_backward_lars_sgd_split_weighted_meta.cpp 2025-05-07T19:50:48.1884941Z Written: gen_embedding_backward_lars_sgd_split_unweighted_nobag_meta.cpp 2025-05-07T19:50:48.1885480Z Written: gen_embedding_backward_lars_sgd_split_unweighted_meta.cpp 2025-05-07T19:50:48.1885982Z Written: gen_embedding_backward_lars_sgd_split_weighted_kernel_cta.cu 2025-05-07T19:50:48.1886546Z Written: gen_embedding_backward_lars_sgd_split_unweighted_nobag_kernel_cta.cu 2025-05-07T19:50:48.1887126Z Written: gen_embedding_backward_lars_sgd_split_unweighted_kernel_cta.cu 2025-05-07T19:50:48.1887656Z Written: gen_embedding_backward_lars_sgd_split_weighted_kernel_warp.cu 2025-05-07T19:50:48.1888221Z Written: gen_embedding_backward_lars_sgd_split_unweighted_nobag_kernel_warp.cu 2025-05-07T19:50:48.1888844Z Written: gen_embedding_backward_lars_sgd_split_unweighted_kernel_warp.cu 2025-05-07T19:50:48.1889381Z Written: gen_embedding_optimizer_lars_sgd_split_device_kernel.cuh 2025-05-07T19:50:48.1889808Z Written: gen_embedding_backward_split_lars_sgd.cpp 2025-05-07T19:50:48.1890208Z Written: gen_embedding_split_lars_sgd_pt2_autograd.cpp 2025-05-07T19:50:48.1890671Z Written: gen_embedding_backward_split_lars_sgd_pt2_cuda_wrapper.cpp 2025-05-07T19:50:48.1891066Z Written: lookup_lars_sgd.py 2025-05-07T19:50:48.1891405Z Written: gen_embedding_backward_split_lars_sgd_cpu.cpp 2025-05-07T19:50:48.1891844Z Written: gen_embedding_backward_split_lars_sgd_pt2_cpu_wrapper.cpp 2025-05-07T19:50:48.1892380Z Written: gen_embedding_backward_partial_rowwise_adam_split_weighted_cuda.cu 2025-05-07T19:50:48.1892963Z Written: gen_embedding_backward_partial_rowwise_adam_split_unweighted_nobag_cuda.cu 2025-05-07T19:50:48.1893571Z Written: gen_embedding_backward_partial_rowwise_adam_split_unweighted_cuda.cu 2025-05-07T19:50:48.1894132Z Written: gen_embedding_backward_partial_rowwise_adam_split_weighted_meta.cpp 2025-05-07T19:50:48.1894729Z Written: gen_embedding_backward_partial_rowwise_adam_split_unweighted_nobag_meta.cpp 2025-05-07T19:50:48.1895350Z Written: gen_embedding_backward_partial_rowwise_adam_split_unweighted_meta.cpp 2025-05-07T19:50:48.1895942Z Written: gen_embedding_backward_partial_rowwise_adam_split_weighted_kernel_cta.cu 2025-05-07T19:50:48.1896591Z Written: gen_embedding_backward_partial_rowwise_adam_split_unweighted_nobag_kernel_cta.cu 2025-05-07T19:50:48.1897232Z Written: gen_embedding_backward_partial_rowwise_adam_split_unweighted_kernel_cta.cu 2025-05-07T19:50:48.1897858Z Written: gen_embedding_backward_partial_rowwise_adam_split_weighted_kernel_warp.cu 2025-05-07T19:50:48.1898512Z Written: gen_embedding_backward_partial_rowwise_adam_split_unweighted_nobag_kernel_warp.cu 2025-05-07T19:50:48.1899157Z Written: gen_embedding_backward_partial_rowwise_adam_split_unweighted_kernel_warp.cu 2025-05-07T19:50:48.2693475Z Written: gen_embedding_optimizer_partial_rowwise_adam_split_device_kernel.cuh 2025-05-07T19:50:48.2695167Z Written: gen_embedding_backward_split_partial_rowwise_adam.cpp 2025-05-07T19:50:48.2696665Z Written: gen_embedding_split_partial_rowwise_adam_pt2_autograd.cpp 2025-05-07T19:50:48.2698918Z Written: gen_embedding_backward_split_partial_rowwise_adam_pt2_cuda_wrapper.cpp 2025-05-07T19:50:48.2700981Z Written: lookup_partial_rowwise_adam.py 2025-05-07T19:50:48.2702248Z Written: gen_embedding_backward_split_partial_rowwise_adam_cpu.cpp 2025-05-07T19:50:48.2703873Z Written: gen_embedding_backward_split_partial_rowwise_adam_pt2_cpu_wrapper.cpp 2025-05-07T19:50:48.2704490Z Written: gen_embedding_backward_partial_rowwise_lamb_split_weighted_cuda.cu 2025-05-07T19:50:48.2705134Z Written: gen_embedding_backward_partial_rowwise_lamb_split_unweighted_nobag_cuda.cu 2025-05-07T19:50:48.2705776Z Written: gen_embedding_backward_partial_rowwise_lamb_split_unweighted_cuda.cu 2025-05-07T19:50:48.2706517Z Written: gen_embedding_backward_partial_rowwise_lamb_split_weighted_meta.cpp 2025-05-07T19:50:48.2707256Z Written: gen_embedding_backward_partial_rowwise_lamb_split_unweighted_nobag_meta.cpp 2025-05-07T19:50:48.2707872Z Written: gen_embedding_backward_partial_rowwise_lamb_split_unweighted_meta.cpp 2025-05-07T19:50:48.2708477Z Written: gen_embedding_backward_partial_rowwise_lamb_split_weighted_kernel_cta.cu 2025-05-07T19:50:48.2709103Z Written: gen_embedding_backward_partial_rowwise_lamb_split_unweighted_nobag_kernel_cta.cu 2025-05-07T19:50:48.2709757Z Written: gen_embedding_backward_partial_rowwise_lamb_split_unweighted_kernel_cta.cu 2025-05-07T19:50:48.2710366Z Written: gen_embedding_backward_partial_rowwise_lamb_split_weighted_kernel_warp.cu 2025-05-07T19:50:48.2711020Z Written: gen_embedding_backward_partial_rowwise_lamb_split_unweighted_nobag_kernel_warp.cu 2025-05-07T19:50:48.2711660Z Written: gen_embedding_backward_partial_rowwise_lamb_split_unweighted_kernel_warp.cu 2025-05-07T19:50:48.2712444Z Written: gen_embedding_optimizer_partial_rowwise_lamb_split_device_kernel.cuh 2025-05-07T19:50:48.2713138Z Written: gen_embedding_backward_split_partial_rowwise_lamb.cpp 2025-05-07T19:50:48.2713637Z Written: gen_embedding_split_partial_rowwise_lamb_pt2_autograd.cpp 2025-05-07T19:50:48.2714215Z Written: gen_embedding_backward_split_partial_rowwise_lamb_pt2_cuda_wrapper.cpp 2025-05-07T19:50:48.2714704Z Written: lookup_partial_rowwise_lamb.py 2025-05-07T19:50:48.2715139Z Written: gen_embedding_backward_split_partial_rowwise_lamb_cpu.cpp 2025-05-07T19:50:48.2715701Z Written: gen_embedding_backward_split_partial_rowwise_lamb_pt2_cpu_wrapper.cpp 2025-05-07T19:50:48.2716305Z Written: gen_embedding_backward_rowwise_adagrad_ssd_weighted_vbe_cuda.cu 2025-05-07T19:50:48.2716884Z Written: gen_embedding_backward_rowwise_adagrad_split_weighted_vbe_cuda.cu 2025-05-07T19:50:48.2717430Z Written: gen_embedding_backward_rowwise_adagrad_ssd_weighted_cuda.cu 2025-05-07T19:50:48.2717977Z Written: gen_embedding_backward_rowwise_adagrad_split_weighted_cuda.cu 2025-05-07T19:50:48.2718651Z Written: gen_embedding_backward_rowwise_adagrad_ssd_unweighted_nobag_cuda.cu 2025-05-07T19:50:48.2719230Z Written: gen_embedding_backward_rowwise_adagrad_split_unweighted_nobag_cuda.cu 2025-05-07T19:50:48.2719800Z Written: gen_embedding_backward_rowwise_adagrad_ssd_unweighted_vbe_cuda.cu 2025-05-07T19:50:48.2720350Z Written: gen_embedding_backward_rowwise_adagrad_split_unweighted_vbe_cuda.cu 2025-05-07T19:50:48.2720903Z Written: gen_embedding_backward_rowwise_adagrad_ssd_unweighted_cuda.cu 2025-05-07T19:50:48.2721425Z Written: gen_embedding_backward_rowwise_adagrad_split_unweighted_cuda.cu 2025-05-07T19:50:48.2721973Z Written: gen_embedding_backward_rowwise_adagrad_ssd_weighted_vbe_meta.cpp 2025-05-07T19:50:48.2722515Z Written: gen_embedding_backward_rowwise_adagrad_split_weighted_vbe_meta.cpp 2025-05-07T19:50:48.2723059Z Written: gen_embedding_backward_rowwise_adagrad_ssd_weighted_meta.cpp 2025-05-07T19:50:48.2723592Z Written: gen_embedding_backward_rowwise_adagrad_split_weighted_meta.cpp 2025-05-07T19:50:48.2724137Z Written: gen_embedding_backward_rowwise_adagrad_ssd_unweighted_nobag_meta.cpp 2025-05-07T19:50:48.2724730Z Written: gen_embedding_backward_rowwise_adagrad_split_unweighted_nobag_meta.cpp 2025-05-07T19:50:48.2725295Z Written: gen_embedding_backward_rowwise_adagrad_ssd_unweighted_vbe_meta.cpp 2025-05-07T19:50:48.2725979Z Written: gen_embedding_backward_rowwise_adagrad_split_unweighted_vbe_meta.cpp 2025-05-07T19:50:48.2726531Z Written: gen_embedding_backward_rowwise_adagrad_ssd_unweighted_meta.cpp 2025-05-07T19:50:48.2727087Z Written: gen_embedding_backward_rowwise_adagrad_split_unweighted_meta.cpp 2025-05-07T19:50:48.2727662Z Written: gen_embedding_backward_rowwise_adagrad_ssd_weighted_vbe_kernel_cta.cu 2025-05-07T19:50:48.2728239Z Written: gen_embedding_backward_rowwise_adagrad_split_weighted_vbe_kernel_cta.cu 2025-05-07T19:50:48.2728822Z Written: gen_embedding_backward_rowwise_adagrad_ssd_weighted_kernel_cta.cu 2025-05-07T19:50:48.2729369Z Written: gen_embedding_backward_rowwise_adagrad_split_weighted_kernel_cta.cu 2025-05-07T19:50:48.2729969Z Written: gen_embedding_backward_rowwise_adagrad_ssd_unweighted_nobag_kernel_cta.cu 2025-05-07T19:50:48.2730595Z Written: gen_embedding_backward_rowwise_adagrad_split_unweighted_nobag_kernel_cta.cu 2025-05-07T19:50:48.2731203Z Written: gen_embedding_backward_rowwise_adagrad_ssd_unweighted_vbe_kernel_cta.cu 2025-05-07T19:50:48.2731817Z Written: gen_embedding_backward_rowwise_adagrad_split_unweighted_vbe_kernel_cta.cu 2025-05-07T19:50:48.2732397Z Written: gen_embedding_backward_rowwise_adagrad_ssd_unweighted_kernel_cta.cu 2025-05-07T19:50:48.2732982Z Written: gen_embedding_backward_rowwise_adagrad_split_unweighted_kernel_cta.cu 2025-05-07T19:50:48.2733564Z Written: gen_embedding_backward_rowwise_adagrad_ssd_weighted_vbe_kernel_warp.cu 2025-05-07T19:50:48.2734169Z Written: gen_embedding_backward_rowwise_adagrad_split_weighted_vbe_kernel_warp.cu 2025-05-07T19:50:48.2734762Z Written: gen_embedding_backward_rowwise_adagrad_ssd_weighted_kernel_warp.cu 2025-05-07T19:50:48.2735371Z Written: gen_embedding_backward_rowwise_adagrad_split_weighted_kernel_warp.cu 2025-05-07T19:50:48.2735977Z Written: gen_embedding_backward_rowwise_adagrad_ssd_unweighted_nobag_kernel_warp.cu 2025-05-07T19:50:48.2736595Z Written: gen_embedding_backward_rowwise_adagrad_split_unweighted_nobag_kernel_warp.cu 2025-05-07T19:50:48.2737235Z Written: gen_embedding_backward_rowwise_adagrad_ssd_unweighted_vbe_kernel_warp.cu 2025-05-07T19:50:48.2737840Z Written: gen_embedding_backward_rowwise_adagrad_split_unweighted_vbe_kernel_warp.cu 2025-05-07T19:50:48.2738453Z Written: gen_embedding_backward_rowwise_adagrad_ssd_unweighted_kernel_warp.cu 2025-05-07T19:50:48.2739053Z Written: gen_embedding_backward_rowwise_adagrad_split_unweighted_kernel_warp.cu 2025-05-07T19:50:48.2739941Z Written: gen_embedding_backward_rowwise_adagrad_split_weighted_vbe_gwd_kernel_cta.cu 2025-05-07T19:50:48.2740683Z Written: gen_embedding_backward_rowwise_adagrad_split_weighted_gwd_kernel_cta.cu 2025-05-07T19:50:48.2741365Z Written: gen_embedding_backward_rowwise_adagrad_split_unweighted_vbe_gwd_kernel_cta.cu 2025-05-07T19:50:48.2742085Z Written: gen_embedding_backward_rowwise_adagrad_split_unweighted_gwd_kernel_cta.cu 2025-05-07T19:50:48.2742800Z Written: gen_embedding_backward_rowwise_adagrad_split_weighted_vbe_gwd_kernel_warp.cu 2025-05-07T19:50:48.2743482Z Written: gen_embedding_backward_rowwise_adagrad_split_weighted_gwd_kernel_warp.cu 2025-05-07T19:50:48.2744193Z Written: gen_embedding_backward_rowwise_adagrad_split_unweighted_vbe_gwd_kernel_warp.cu 2025-05-07T19:50:48.2744885Z Written: gen_embedding_backward_rowwise_adagrad_split_unweighted_gwd_kernel_warp.cu 2025-05-07T19:50:48.2745564Z Written: gen_embedding_backward_rowwise_adagrad_split_weighted_vbe_gwd_cuda.cu 2025-05-07T19:50:48.2746185Z Written: gen_embedding_backward_rowwise_adagrad_split_weighted_gwd_cuda.cu 2025-05-07T19:50:48.2746831Z Written: gen_embedding_backward_rowwise_adagrad_split_unweighted_vbe_gwd_cuda.cu 2025-05-07T19:50:48.2747491Z Written: gen_embedding_backward_rowwise_adagrad_split_unweighted_gwd_cuda.cu 2025-05-07T19:50:48.2748084Z Written: gen_embedding_optimizer_rowwise_adagrad_ssd_device_kernel.cuh 2025-05-07T19:50:48.2748678Z Written: gen_embedding_optimizer_rowwise_adagrad_split_device_kernel.cuh 2025-05-07T19:50:48.2749184Z Written: gen_embedding_backward_ssd_rowwise_adagrad.cpp 2025-05-07T19:50:48.2749755Z Written: gen_embedding_ssd_rowwise_adagrad_pt2_autograd.cpp 2025-05-07T19:50:48.2750281Z Written: gen_embedding_backward_ssd_rowwise_adagrad_pt2_cuda_wrapper.cpp 2025-05-07T19:50:48.2750765Z Written: lookup_rowwise_adagrad_ssd.py 2025-05-07T19:50:48.2751173Z Written: gen_embedding_backward_split_rowwise_adagrad.cpp 2025-05-07T19:50:48.2751645Z Written: gen_embedding_split_rowwise_adagrad_pt2_autograd.cpp 2025-05-07T19:50:48.2752309Z Written: gen_embedding_backward_split_rowwise_adagrad_pt2_cuda_wrapper.cpp 2025-05-07T19:50:48.2752740Z Written: lookup_rowwise_adagrad.py 2025-05-07T19:50:48.2753129Z Written: gen_embedding_backward_rowwise_adagrad_split_cpu.cpp 2025-05-07T19:50:48.2753580Z Written: gen_embedding_backward_split_rowwise_adagrad_cpu.cpp 2025-05-07T19:50:48.2754092Z Written: gen_embedding_backward_split_rowwise_adagrad_pt2_cpu_wrapper.cpp 2025-05-07T19:50:48.2754692Z Written: gen_embedding_optimizer_approx_rowwise_adagrad_split_device_kernel.cuh 2025-05-07T19:50:48.2755250Z Written: gen_embedding_backward_split_approx_rowwise_adagrad.cpp 2025-05-07T19:50:48.2755780Z Written: gen_embedding_split_approx_rowwise_adagrad_pt2_autograd.cpp 2025-05-07T19:50:48.2756345Z Written: gen_embedding_backward_split_approx_rowwise_adagrad_pt2_cuda_wrapper.cpp 2025-05-07T19:50:48.2756940Z Written: gen_embedding_backward_split_approx_rowwise_adagrad_cpu.cpp 2025-05-07T19:50:48.2757505Z Written: gen_embedding_backward_split_approx_rowwise_adagrad_pt2_cpu_wrapper.cpp 2025-05-07T19:50:48.2758177Z Written: gen_embedding_optimizer_rowwise_adagrad_with_weight_decay_split_device_kernel.cuh 2025-05-07T19:50:48.2758829Z Written: gen_embedding_backward_split_rowwise_adagrad_with_weight_decay.cpp 2025-05-07T19:50:48.2759459Z Written: gen_embedding_split_rowwise_adagrad_with_weight_decay_pt2_autograd.cpp 2025-05-07T19:50:48.2760132Z Written: gen_embedding_backward_split_rowwise_adagrad_with_weight_decay_pt2_cuda_wrapper.cpp 2025-05-07T19:50:48.2760785Z Written: gen_embedding_backward_split_rowwise_adagrad_with_weight_decay_cpu.cpp 2025-05-07T19:50:48.2761448Z Written: gen_embedding_backward_split_rowwise_adagrad_with_weight_decay_pt2_cpu_wrapper.cpp 2025-05-07T19:50:48.2762192Z Written: gen_embedding_optimizer_approx_rowwise_adagrad_with_weight_decay_split_device_kernel.cuh 2025-05-07T19:50:48.2762863Z Written: gen_embedding_backward_split_approx_rowwise_adagrad_with_weight_decay.cpp 2025-05-07T19:50:48.2763514Z Written: gen_embedding_split_approx_rowwise_adagrad_with_weight_decay_pt2_autograd.cpp 2025-05-07T19:50:48.2764214Z Written: gen_embedding_backward_split_approx_rowwise_adagrad_with_weight_decay_pt2_cuda_wrapper.cpp 2025-05-07T19:50:48.2764931Z Written: gen_embedding_backward_split_approx_rowwise_adagrad_with_weight_decay_cpu.cpp 2025-05-07T19:50:48.3731041Z Written: gen_embedding_backward_split_approx_rowwise_adagrad_with_weight_decay_pt2_cpu_wrapper.cpp 2025-05-07T19:50:48.3733158Z Written: gen_embedding_backward_rowwise_adagrad_with_counter_split_weighted_vbe_cuda.cu 2025-05-07T19:50:48.3733876Z Written: gen_embedding_backward_rowwise_adagrad_with_counter_split_weighted_cuda.cu 2025-05-07T19:50:48.3734569Z Written: gen_embedding_backward_rowwise_adagrad_with_counter_split_unweighted_nobag_cuda.cu 2025-05-07T19:50:48.3735305Z Written: gen_embedding_backward_rowwise_adagrad_with_counter_split_unweighted_vbe_cuda.cu 2025-05-07T19:50:48.3736099Z Written: gen_embedding_backward_rowwise_adagrad_with_counter_split_unweighted_cuda.cu 2025-05-07T19:50:48.3736779Z Written: gen_embedding_backward_rowwise_adagrad_with_counter_split_weighted_vbe_meta.cpp 2025-05-07T19:50:48.3737561Z Written: gen_embedding_backward_rowwise_adagrad_with_counter_split_weighted_meta.cpp 2025-05-07T19:50:48.3738212Z Written: gen_embedding_backward_rowwise_adagrad_with_counter_split_unweighted_nobag_meta.cpp 2025-05-07T19:50:48.3738899Z Written: gen_embedding_backward_rowwise_adagrad_with_counter_split_unweighted_vbe_meta.cpp 2025-05-07T19:50:48.3739672Z Written: gen_embedding_backward_rowwise_adagrad_with_counter_split_unweighted_meta.cpp 2025-05-07T19:50:48.3740806Z Written: gen_embedding_backward_rowwise_adagrad_with_counter_split_weighted_vbe_kernel_cta.cu 2025-05-07T19:50:48.3741532Z Written: gen_embedding_backward_rowwise_adagrad_with_counter_split_weighted_kernel_cta.cu 2025-05-07T19:50:48.3742293Z Written: gen_embedding_backward_rowwise_adagrad_with_counter_split_unweighted_nobag_kernel_cta.cu 2025-05-07T19:50:48.3743067Z Written: gen_embedding_backward_rowwise_adagrad_with_counter_split_unweighted_vbe_kernel_cta.cu 2025-05-07T19:50:48.3743801Z Written: gen_embedding_backward_rowwise_adagrad_with_counter_split_unweighted_kernel_cta.cu 2025-05-07T19:50:48.3744547Z Written: gen_embedding_backward_rowwise_adagrad_with_counter_split_weighted_vbe_kernel_warp.cu 2025-05-07T19:50:48.3745285Z Written: gen_embedding_backward_rowwise_adagrad_with_counter_split_weighted_kernel_warp.cu 2025-05-07T19:50:48.3746052Z Written: gen_embedding_backward_rowwise_adagrad_with_counter_split_unweighted_nobag_kernel_warp.cu 2025-05-07T19:50:48.3746851Z Written: gen_embedding_backward_rowwise_adagrad_with_counter_split_unweighted_vbe_kernel_warp.cu 2025-05-07T19:50:48.3747597Z Written: gen_embedding_backward_rowwise_adagrad_with_counter_split_unweighted_kernel_warp.cu 2025-05-07T19:50:48.3748319Z Written: gen_embedding_optimizer_rowwise_adagrad_with_counter_split_device_kernel.cuh 2025-05-07T19:50:48.3748942Z Written: gen_embedding_backward_split_rowwise_adagrad_with_counter.cpp 2025-05-07T19:50:48.3749533Z Written: gen_embedding_split_rowwise_adagrad_with_counter_pt2_autograd.cpp 2025-05-07T19:50:48.3750183Z Written: gen_embedding_backward_split_rowwise_adagrad_with_counter_pt2_cuda_wrapper.cpp 2025-05-07T19:50:48.3750816Z Written: lookup_rowwise_adagrad_with_counter.py 2025-05-07T19:50:48.3751319Z Written: gen_embedding_backward_split_rowwise_adagrad_with_counter_cpu.cpp 2025-05-07T19:50:48.3751951Z Written: gen_embedding_backward_split_rowwise_adagrad_with_counter_pt2_cpu_wrapper.cpp 2025-05-07T19:50:48.3752773Z Written: gen_embedding_optimizer_approx_rowwise_adagrad_with_counter_split_device_kernel.cuh 2025-05-07T19:50:48.3753405Z Written: gen_embedding_backward_split_approx_rowwise_adagrad_with_counter.cpp 2025-05-07T19:50:48.3754009Z Written: gen_embedding_split_approx_rowwise_adagrad_with_counter_pt2_autograd.cpp 2025-05-07T19:50:48.3754668Z Written: gen_embedding_backward_split_approx_rowwise_adagrad_with_counter_pt2_cuda_wrapper.cpp 2025-05-07T19:50:48.3755311Z Written: gen_embedding_backward_split_approx_rowwise_adagrad_with_counter_cpu.cpp 2025-05-07T19:50:48.3755971Z Written: gen_embedding_backward_split_approx_rowwise_adagrad_with_counter_pt2_cpu_wrapper.cpp 2025-05-07T19:50:48.3756618Z Written: gen_embedding_optimizer_rowwise_weighted_adagrad_split_device_kernel.cuh 2025-05-07T19:50:48.3757190Z Written: gen_embedding_backward_split_rowwise_weighted_adagrad.cpp 2025-05-07T19:50:48.3757692Z Written: gen_embedding_split_rowwise_weighted_adagrad_pt2_autograd.cpp 2025-05-07T19:50:48.3758278Z Written: gen_embedding_backward_split_rowwise_weighted_adagrad_pt2_cuda_wrapper.cpp 2025-05-07T19:50:48.3758864Z Written: gen_embedding_backward_split_rowwise_weighted_adagrad_cpu.cpp 2025-05-07T19:50:48.3759421Z Written: gen_embedding_backward_split_rowwise_weighted_adagrad_pt2_cpu_wrapper.cpp 2025-05-07T19:50:48.3759960Z Written: gen_embedding_backward_sgd_split_weighted_vbe_cuda.cu 2025-05-07T19:50:48.3760439Z Written: gen_embedding_backward_sgd_split_weighted_cuda.cu 2025-05-07T19:50:48.3760915Z Written: gen_embedding_backward_sgd_split_unweighted_nobag_cuda.cu 2025-05-07T19:50:48.3761394Z Written: gen_embedding_backward_sgd_split_unweighted_vbe_cuda.cu 2025-05-07T19:50:48.3761868Z Written: gen_embedding_backward_sgd_split_unweighted_cuda.cu 2025-05-07T19:50:48.3762329Z Written: gen_embedding_backward_sgd_split_weighted_vbe_meta.cpp 2025-05-07T19:50:48.3762799Z Written: gen_embedding_backward_sgd_split_weighted_meta.cpp 2025-05-07T19:50:48.3763265Z Written: gen_embedding_backward_sgd_split_unweighted_nobag_meta.cpp 2025-05-07T19:50:48.3763848Z Written: gen_embedding_backward_sgd_split_unweighted_vbe_meta.cpp 2025-05-07T19:50:48.3764336Z Written: gen_embedding_backward_sgd_split_unweighted_meta.cpp 2025-05-07T19:50:48.3764820Z Written: gen_embedding_backward_sgd_split_weighted_vbe_kernel_cta.cu 2025-05-07T19:50:48.3765330Z Written: gen_embedding_backward_sgd_split_weighted_kernel_cta.cu 2025-05-07T19:50:48.3765833Z Written: gen_embedding_backward_sgd_split_unweighted_nobag_kernel_cta.cu 2025-05-07T19:50:48.3766385Z Written: gen_embedding_backward_sgd_split_unweighted_vbe_kernel_cta.cu 2025-05-07T19:50:48.3766889Z Written: gen_embedding_backward_sgd_split_unweighted_kernel_cta.cu 2025-05-07T19:50:48.3767399Z Written: gen_embedding_backward_sgd_split_weighted_vbe_kernel_warp.cu 2025-05-07T19:50:48.3767920Z Written: gen_embedding_backward_sgd_split_weighted_kernel_warp.cu 2025-05-07T19:50:48.3768430Z Written: gen_embedding_backward_sgd_split_unweighted_nobag_kernel_warp.cu 2025-05-07T19:50:48.3768980Z Written: gen_embedding_backward_sgd_split_unweighted_vbe_kernel_warp.cu 2025-05-07T19:50:48.3769495Z Written: gen_embedding_backward_sgd_split_unweighted_kernel_warp.cu 2025-05-07T19:50:48.3769992Z Written: gen_embedding_optimizer_sgd_split_device_kernel.cuh 2025-05-07T19:50:48.3770394Z Written: gen_embedding_backward_split_sgd.cpp 2025-05-07T19:50:48.3770773Z Written: gen_embedding_split_sgd_pt2_autograd.cpp 2025-05-07T19:50:48.3771213Z Written: gen_embedding_backward_split_sgd_pt2_cuda_wrapper.cpp 2025-05-07T19:50:48.3771589Z Written: lookup_sgd.py 2025-05-07T19:50:48.3771898Z Written: gen_embedding_backward_sgd_split_cpu.cpp 2025-05-07T19:50:48.3772262Z Written: gen_embedding_backward_split_sgd_cpu.cpp 2025-05-07T19:50:48.3772747Z Written: gen_embedding_backward_split_sgd_pt2_cpu_wrapper.cpp 2025-05-07T19:50:48.3773217Z Written: gen_embedding_optimizer_approx_sgd_split_device_kernel.cuh 2025-05-07T19:50:48.3773670Z Written: gen_embedding_backward_split_approx_sgd.cpp 2025-05-07T19:50:48.3774068Z Written: gen_embedding_split_approx_sgd_pt2_autograd.cpp 2025-05-07T19:50:48.3774543Z Written: gen_embedding_backward_split_approx_sgd_pt2_cuda_wrapper.cpp 2025-05-07T19:50:48.3775015Z Written: gen_embedding_backward_split_approx_sgd_cpu.cpp 2025-05-07T19:50:48.3775464Z Written: gen_embedding_backward_split_approx_sgd_pt2_cpu_wrapper.cpp 2025-05-07T19:50:48.3775942Z Written: gen_embedding_backward_none_split_weighted_cuda.cu 2025-05-07T19:50:48.3776391Z Written: gen_embedding_backward_none_split_unweighted_nobag_cuda.cu 2025-05-07T19:50:48.3776887Z Written: gen_embedding_backward_none_split_unweighted_cuda.cu 2025-05-07T19:50:48.3777346Z Written: gen_embedding_backward_none_split_weighted_meta.cpp 2025-05-07T19:50:48.3777825Z Written: gen_embedding_backward_none_split_unweighted_nobag_meta.cpp 2025-05-07T19:50:48.3778336Z Written: gen_embedding_backward_none_split_unweighted_meta.cpp 2025-05-07T19:50:48.3778810Z Written: gen_embedding_backward_none_split_weighted_kernel_cta.cu 2025-05-07T19:50:48.3779438Z Written: gen_embedding_backward_none_split_unweighted_nobag_kernel_cta.cu 2025-05-07T19:50:48.3780168Z Written: gen_embedding_backward_none_split_unweighted_kernel_cta.cu 2025-05-07T19:50:48.3780794Z Written: gen_embedding_backward_none_split_weighted_kernel_warp.cu 2025-05-07T19:50:48.3781352Z Written: gen_embedding_backward_none_split_unweighted_nobag_kernel_warp.cu 2025-05-07T19:50:48.3781944Z Written: gen_embedding_backward_none_split_unweighted_kernel_warp.cu 2025-05-07T19:50:48.3782468Z Written: gen_embedding_optimizer_none_split_device_kernel.cuh 2025-05-07T19:50:48.3782898Z Written: gen_embedding_backward_split_none.cpp 2025-05-07T19:50:48.3783294Z Written: gen_embedding_split_none_pt2_autograd.cpp 2025-05-07T19:50:48.3783739Z Written: gen_embedding_backward_split_none_pt2_cuda_wrapper.cpp 2025-05-07T19:50:48.3784159Z Written: lookup_none.py 2025-05-07T19:50:48.3784462Z Written: gen_embedding_backward_split_none_cpu.cpp 2025-05-07T19:50:48.3784920Z Written: gen_embedding_backward_split_none_pt2_cpu_wrapper.cpp 2025-05-07T19:50:48.3785499Z Written: gen_embedding_backward_split_weighted_device_kernel_hip.hip 2025-05-07T19:50:48.3786099Z Written: gen_embedding_backward_split_unweighted_nobag_device_kernel_hip.hip 2025-05-07T19:50:48.3786710Z Written: gen_embedding_backward_split_unweighted_device_kernel_hip.hip 2025-05-07T19:50:48.3787263Z Written: gen_embedding_backward_ssd_weighted_vbe_device_kernel.cuh 2025-05-07T19:50:48.3787814Z Written: gen_embedding_backward_split_weighted_vbe_device_kernel.cuh 2025-05-07T19:50:48.3788350Z Written: gen_embedding_backward_ssd_weighted_device_kernel.cuh 2025-05-07T19:50:48.3788887Z Written: gen_embedding_backward_split_weighted_device_kernel.cuh 2025-05-07T19:50:48.3789433Z Written: gen_embedding_backward_ssd_unweighted_nobag_device_kernel.cuh 2025-05-07T19:50:48.3790037Z Written: gen_embedding_backward_split_unweighted_nobag_device_kernel.cuh 2025-05-07T19:50:48.3790615Z Written: gen_embedding_backward_ssd_unweighted_vbe_device_kernel.cuh 2025-05-07T19:50:48.3791169Z Written: gen_embedding_backward_split_unweighted_vbe_device_kernel.cuh 2025-05-07T19:50:48.3791731Z Written: gen_embedding_backward_ssd_unweighted_device_kernel.cuh 2025-05-07T19:50:48.3792366Z Written: gen_embedding_backward_split_unweighted_device_kernel.cuh 2025-05-07T19:50:48.3792870Z Written: gen_embedding_backward_split_common_device_kernel.cuh 2025-05-07T19:50:48.3793326Z Written: gen_embedding_backward_split_grad_embedding_ops.cu 2025-05-07T19:50:48.3793826Z Written: gen_embedding_backward_dense_indice_weights_codegen_cuda.cu 2025-05-07T19:50:48.3794344Z Written: gen_embedding_backward_ssd_indice_weights_codegen_cuda.cu 2025-05-07T19:50:48.3794844Z Written: gen_embedding_backward_split_indice_weights_codegen_cuda.cu 2025-05-07T19:50:48.3795336Z Written: pt2_arg_utils.h 2025-05-07T19:50:48.3795598Z Written: __init__.py 2025-05-07T19:50:48.3795869Z Written: lookup_args_ssd.py 2025-05-07T19:50:48.3796145Z Written: lookup_args.py 2025-05-07T19:50:48.3863569Z 2025-05-07T19:50:48.3863705Z 2025-05-07T19:50:48.3864277Z ================================================================================ 2025-05-07T19:50:48.3865440Z Running code generation script ... 2025-05-07T19:50:48.3867621Z /github/home/miniconda/envs/build_binary/bin/python /__w/FBGEMM/FBGEMM/fbgemm_gpu/codegen/genscript/generate_embedding_optimizer.py --opensource 2025-05-07T19:50:48.3868456Z ================================================================================ 2025-05-07T19:50:48.3868697Z 2025-05-07T19:50:48.4932814Z [ARGS PARSE] Parsed arguments: Namespace(install_dir='.', is_fbcode=False, is_rocm=False) 2025-05-07T19:50:48.4933747Z [GENERATE OPTIMIZERS]: ['/__w/FBGEMM/FBGEMM/fbgemm_gpu/codegen/genscript/generate_embedding_optimizer.py', '--opensource'] 2025-05-07T19:50:48.4934566Z Written: gen_embedding_optimizer_rowwise_adagrad_split_cuda.cu 2025-05-07T19:50:48.4935085Z Written: gen_embedding_optimizer_rowwise_adagrad_split_kernel.cu 2025-05-07T19:50:48.4935614Z Written: gen_embedding_optimizer_rowwise_adagrad_split.cpp 2025-05-07T19:50:48.4936143Z Written: gen_embedding_optimizer_rowwise_adagrad_split_device_kernel.cuh 2025-05-07T19:50:48.4936693Z Written: split_embedding_optimizer_rowwise_adagrad.py 2025-05-07T19:50:48.4937225Z Written: optimizer_args.py 2025-05-07T19:50:48.5042203Z 2025-05-07T19:50:48.5042294Z 2025-05-07T19:50:48.5042858Z ================================================================================ 2025-05-07T19:50:48.5044005Z Running code generation script ... 2025-05-07T19:50:48.5046357Z /github/home/miniconda/envs/build_binary/bin/python /__w/FBGEMM/FBGEMM/fbgemm_gpu/codegen/genscript/generate_forward_quantized.py --opensource 2025-05-07T19:50:48.5048780Z ================================================================================ 2025-05-07T19:50:48.5049512Z 2025-05-07T19:50:48.6203563Z [ARGS PARSE] Parsed arguments: Namespace(install_dir='.', is_fbcode=False, is_rocm=False) 2025-05-07T19:50:48.6206284Z [GENERATE FORWARD QUANTIZED]: ['/__w/FBGEMM/FBGEMM/fbgemm_gpu/codegen/genscript/generate_forward_quantized.py', '--opensource'] 2025-05-07T19:50:48.6209346Z Written: gen_embedding_forward_quantized_split_nbit_kernel_weighted_fp32_codegen_cuda.cu 2025-05-07T19:50:48.6210028Z Written: gen_embedding_forward_quantized_split_nbit_kernel_weighted_fp16_codegen_cuda.cu 2025-05-07T19:50:48.6210669Z Written: gen_embedding_forward_quantized_split_nbit_kernel_weighted_fp8_codegen_cuda.cu 2025-05-07T19:50:48.6211319Z Written: gen_embedding_forward_quantized_split_nbit_kernel_weighted_int8_codegen_cuda.cu 2025-05-07T19:50:48.6211961Z Written: gen_embedding_forward_quantized_split_nbit_kernel_weighted_int4_codegen_cuda.cu 2025-05-07T19:50:48.6212613Z Written: gen_embedding_forward_quantized_split_nbit_kernel_weighted_int2_codegen_cuda.cu 2025-05-07T19:50:48.6213311Z Written: gen_embedding_forward_quantized_split_nbit_kernel_unweighted_nobag_fp32_codegen_cuda.cu 2025-05-07T19:50:48.6214015Z Written: gen_embedding_forward_quantized_split_nbit_kernel_unweighted_nobag_fp16_codegen_cuda.cu 2025-05-07T19:50:48.6214731Z Written: gen_embedding_forward_quantized_split_nbit_kernel_unweighted_nobag_fp8_codegen_cuda.cu 2025-05-07T19:50:48.6215430Z Written: gen_embedding_forward_quantized_split_nbit_kernel_unweighted_nobag_int8_codegen_cuda.cu 2025-05-07T19:50:48.6216142Z Written: gen_embedding_forward_quantized_split_nbit_kernel_unweighted_nobag_int4_codegen_cuda.cu 2025-05-07T19:50:48.6216857Z Written: gen_embedding_forward_quantized_split_nbit_kernel_unweighted_nobag_int2_codegen_cuda.cu 2025-05-07T19:50:48.6217534Z Written: gen_embedding_forward_quantized_split_nbit_kernel_unweighted_fp32_codegen_cuda.cu 2025-05-07T19:50:48.6218203Z Written: gen_embedding_forward_quantized_split_nbit_kernel_unweighted_fp16_codegen_cuda.cu 2025-05-07T19:50:48.6218858Z Written: gen_embedding_forward_quantized_split_nbit_kernel_unweighted_fp8_codegen_cuda.cu 2025-05-07T19:50:48.6219762Z Written: gen_embedding_forward_quantized_split_nbit_kernel_unweighted_int8_codegen_cuda.cu 2025-05-07T19:50:48.6220665Z Written: gen_embedding_forward_quantized_split_nbit_kernel_unweighted_int4_codegen_cuda.cu 2025-05-07T19:50:48.6221392Z Written: gen_embedding_forward_quantized_split_nbit_kernel_unweighted_int2_codegen_cuda.cu 2025-05-07T19:50:48.6222092Z Written: gen_embedding_forward_quantized_split_nbit_host_weighted_codegen_cuda.cu 2025-05-07T19:50:48.6222763Z Written: gen_embedding_forward_quantized_split_nbit_host_unweighted_nobag_codegen_cuda.cu 2025-05-07T19:50:48.6223459Z Written: gen_embedding_forward_quantized_split_nbit_host_unweighted_codegen_cuda.cu 2025-05-07T19:50:48.6224054Z Written: gen_embedding_forward_quantized_weighted_codegen_cpu.cpp 2025-05-07T19:50:48.6224597Z Written: gen_embedding_forward_quantized_unweighted_codegen_cpu.cpp 2025-05-07T19:50:48.6306892Z 2025-05-07T19:50:48.6307026Z 2025-05-07T19:50:48.6307563Z ================================================================================ 2025-05-07T19:50:48.6308661Z Running code generation script ... 2025-05-07T19:50:48.6310966Z /github/home/miniconda/envs/build_binary/bin/python /__w/FBGEMM/FBGEMM/fbgemm_gpu/codegen/genscript/generate_forward_split.py --opensource 2025-05-07T19:50:48.6313305Z ================================================================================ 2025-05-07T19:50:48.6313983Z 2025-05-07T19:50:48.9659659Z [ARGS PARSE] Parsed arguments: Namespace(install_dir='.', is_fbcode=False, is_rocm=False) 2025-05-07T19:50:48.9660624Z [GENERATE FORWARD SPLIT]: ['/__w/FBGEMM/FBGEMM/fbgemm_gpu/codegen/genscript/generate_forward_split.py', '--opensource'] 2025-05-07T19:50:48.9661370Z Written: gen_embedding_forward_dense_weighted_vbe_codegen_cuda.cu 2025-05-07T19:50:48.9661889Z Written: gen_embedding_forward_dense_weighted_codegen_cuda.cu 2025-05-07T19:50:48.9662390Z Written: gen_embedding_forward_dense_unweighted_vbe_codegen_cuda.cu 2025-05-07T19:50:48.9662932Z Written: gen_embedding_forward_dense_unweighted_codegen_cuda.cu 2025-05-07T19:50:48.9663438Z Written: gen_embedding_forward_ssd_weighted_vbe_codegen_cuda.cu 2025-05-07T19:50:48.9664192Z Written: gen_embedding_forward_split_weighted_vbe_codegen_cuda.cu 2025-05-07T19:50:48.9664918Z Written: gen_embedding_forward_ssd_weighted_codegen_cuda.cu 2025-05-07T19:50:48.9665382Z Written: gen_embedding_forward_split_weighted_codegen_cuda.cu 2025-05-07T19:50:48.9665863Z Written: gen_embedding_forward_ssd_unweighted_vbe_codegen_cuda.cu 2025-05-07T19:50:48.9666347Z Written: gen_embedding_forward_split_unweighted_vbe_codegen_cuda.cu 2025-05-07T19:50:48.9666844Z Written: gen_embedding_forward_ssd_unweighted_codegen_cuda.cu 2025-05-07T19:50:48.9667304Z Written: gen_embedding_forward_split_unweighted_codegen_cuda.cu 2025-05-07T19:50:48.9667811Z Written: gen_embedding_forward_split_weighted_vbe_gwd_codegen_cuda.cu 2025-05-07T19:50:48.9668322Z Written: gen_embedding_forward_split_weighted_gwd_codegen_cuda.cu 2025-05-07T19:50:48.9668823Z Written: gen_embedding_forward_split_unweighted_vbe_gwd_codegen_cuda.cu 2025-05-07T19:50:48.9669349Z Written: gen_embedding_forward_split_unweighted_gwd_codegen_cuda.cu 2025-05-07T19:50:48.9669838Z Written: gen_embedding_forward_dense_weighted_vbe_codegen_meta.cpp 2025-05-07T19:50:48.9670333Z Written: gen_embedding_forward_dense_weighted_codegen_meta.cpp 2025-05-07T19:50:48.9670814Z Written: gen_embedding_forward_dense_unweighted_vbe_codegen_meta.cpp 2025-05-07T19:50:48.9671316Z Written: gen_embedding_forward_dense_unweighted_codegen_meta.cpp 2025-05-07T19:50:48.9671826Z Written: gen_embedding_forward_ssd_weighted_vbe_codegen_meta.cpp 2025-05-07T19:50:48.9672302Z Written: gen_embedding_forward_split_weighted_vbe_codegen_meta.cpp 2025-05-07T19:50:48.9672782Z Written: gen_embedding_forward_ssd_weighted_codegen_meta.cpp 2025-05-07T19:50:48.9673229Z Written: gen_embedding_forward_split_weighted_codegen_meta.cpp 2025-05-07T19:50:48.9673709Z Written: gen_embedding_forward_ssd_unweighted_vbe_codegen_meta.cpp 2025-05-07T19:50:48.9674326Z Written: gen_embedding_forward_split_unweighted_vbe_codegen_meta.cpp 2025-05-07T19:50:48.9674814Z Written: gen_embedding_forward_ssd_unweighted_codegen_meta.cpp 2025-05-07T19:50:48.9675292Z Written: gen_embedding_forward_split_unweighted_codegen_meta.cpp 2025-05-07T19:50:48.9675744Z Written: gen_embedding_forward_dense_weighted_vbe_kernel.cu 2025-05-07T19:50:48.9676188Z Written: gen_embedding_forward_dense_weighted_kernel.cu 2025-05-07T19:50:48.9676619Z Written: gen_embedding_forward_dense_unweighted_nobag_kernel.cu 2025-05-07T19:50:48.9677093Z Written: gen_embedding_forward_dense_unweighted_vbe_kernel.cu 2025-05-07T19:50:48.9677542Z Written: gen_embedding_forward_dense_unweighted_kernel.cu 2025-05-07T19:50:48.9677957Z Written: gen_embedding_forward_ssd_weighted_vbe_kernel.cu 2025-05-07T19:50:48.9678402Z Written: gen_embedding_forward_split_weighted_vbe_kernel.cu 2025-05-07T19:50:48.9678817Z Written: gen_embedding_forward_ssd_weighted_kernel.cu 2025-05-07T19:50:48.9679239Z Written: gen_embedding_forward_split_weighted_kernel.cu 2025-05-07T19:50:48.9679661Z Written: gen_embedding_forward_ssd_unweighted_nobag_kernel.cu 2025-05-07T19:50:48.9680144Z Written: gen_embedding_forward_split_unweighted_nobag_kernel.cu 2025-05-07T19:50:48.9680601Z Written: gen_embedding_forward_ssd_unweighted_vbe_kernel.cu 2025-05-07T19:50:48.9681063Z Written: gen_embedding_forward_split_unweighted_vbe_kernel.cu 2025-05-07T19:50:48.9681509Z Written: gen_embedding_forward_ssd_unweighted_kernel.cu 2025-05-07T19:50:48.9681919Z Written: gen_embedding_forward_split_unweighted_kernel.cu 2025-05-07T19:50:48.9682371Z Written: gen_embedding_forward_split_weighted_vbe_gwd_kernel.cu 2025-05-07T19:50:48.9682818Z Written: gen_embedding_forward_split_weighted_gwd_kernel.cu 2025-05-07T19:50:48.9683287Z Written: gen_embedding_forward_split_unweighted_vbe_gwd_kernel.cu 2025-05-07T19:50:48.9683749Z Written: gen_embedding_forward_split_unweighted_gwd_kernel.cu 2025-05-07T19:50:48.9684206Z Written: gen_embedding_forward_split_weighted_v2_kernel.cu 2025-05-07T19:50:48.9684657Z Written: gen_embedding_forward_split_unweighted_v2_kernel.cu 2025-05-07T19:50:48.9685130Z Written: gen_embedding_forward_dense_unweighted_nobag_kernel_small.cu 2025-05-07T19:50:48.9685657Z Written: gen_embedding_forward_dense_unweighted_nobag_kernel_small.cu 2025-05-07T19:50:48.9686218Z Written: gen_embedding_forward_ssd_unweighted_nobag_kernel_small.cu 2025-05-07T19:50:48.9686739Z Written: gen_embedding_forward_split_unweighted_nobag_kernel_small.cu 2025-05-07T19:50:48.9687209Z Written: gen_embedding_forward_split_pt2_cuda_wrapper.cpp 2025-05-07T19:50:48.9687653Z Written: gen_embedding_forward_split_pt2_cpu_wrapper.cpp 2025-05-07T19:50:48.9688084Z Written: gen_embedding_forward_ssd_pt2_cuda_wrapper.cpp 2025-05-07T19:50:48.9782395Z 2025-05-07T19:50:48.9782857Z 2025-05-07T19:50:48.9783490Z ================================================================================ 2025-05-07T19:50:48.9784349Z Running code generation script ... 2025-05-07T19:50:48.9785161Z /github/home/miniconda/envs/build_binary/bin/python /__w/FBGEMM/FBGEMM/fbgemm_gpu/codegen/genscript/generate_index_select.py --opensource 2025-05-07T19:50:48.9785975Z ================================================================================ 2025-05-07T19:50:48.9786224Z 2025-05-07T19:50:49.2389868Z [ARGS PARSE] Parsed arguments: Namespace(install_dir='.', is_fbcode=False, is_rocm=False) 2025-05-07T19:50:49.2390779Z [INDEX SELECT GENERATOR]: ['/__w/FBGEMM/FBGEMM/fbgemm_gpu/codegen/genscript/generate_index_select.py', '--opensource'] 2025-05-07T19:50:49.2391533Z Written: gen_batch_index_select_dim0_forward_codegen_cuda.cu 2025-05-07T19:50:49.2392005Z Written: gen_batch_index_select_dim0_forward_kernel.cu 2025-05-07T19:50:49.2392451Z Written: gen_batch_index_select_dim0_forward_kernel_small.cu 2025-05-07T19:50:49.2392937Z Written: gen_batch_index_select_dim0_backward_codegen_cuda.cu 2025-05-07T19:50:49.2393412Z Written: gen_batch_index_select_dim0_backward_kernel_cta.cu 2025-05-07T19:50:49.2394151Z Written: gen_batch_index_select_dim0_backward_kernel_warp.cu 2025-05-07T19:50:49.2394787Z Written: gen_embedding_backward_split_batch_index_select_device_kernel.cuh 2025-05-07T19:50:49.2395318Z Written: gen_embedding_backward_split_grad_index_select.cu 2025-05-07T19:50:49.2395817Z Written: gen_embedding_backward_split_common_device_kernel.cuh 2025-05-07T19:50:49.2585261Z 2025-05-07T19:50:49.2585718Z 2025-05-07T19:50:49.2586149Z ================================================================================ 2025-05-07T19:50:49.2586680Z GPU CPP Library Target: fbgemm_gpu_experimental_gen_ai (SHARED) 2025-05-07T19:50:49.2587100Z 2025-05-07T19:50:49.2587327Z CPU_SRCS: 2025-05-07T19:50:49.2587752Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/attention/attention.cpp 2025-05-07T19:50:49.2588394Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/coalesce/coalesce.cpp 2025-05-07T19:50:49.2589033Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/quantize.cpp 2025-05-07T19:50:49.2589659Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/comm/car.cpp 2025-05-07T19:50:49.2590285Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/gather_scatter/gather_scatter.cpp 2025-05-07T19:50:49.2590999Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/moe/index_shuffling.cpp 2025-05-07T19:50:49.2591623Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/kv_cache/kv_cache.cpp 2025-05-07T19:50:49.2592114Z 2025-05-07T19:50:49.2592313Z GPU_SRCS: 2025-05-07T19:50:49.2592828Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/attention/gqa_attn_splitk.cu 2025-05-07T19:50:49.2593494Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/coalesce/coalesce.cu 2025-05-07T19:50:49.2594116Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/quantize.cu 2025-05-07T19:50:49.2594691Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/comm/car.cu 2025-05-07T19:50:49.2595335Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/gather_scatter/gather_scatter.cu 2025-05-07T19:50:49.2596012Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/moe/index_shuffling.cu 2025-05-07T19:50:49.2596628Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/kv_cache/kv_cache.cu 2025-05-07T19:50:49.2597087Z 2025-05-07T19:50:49.2597288Z CUDA_SPECIFIC_SRCS: 2025-05-07T19:50:49.2598069Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/bf16bf16bf16_grouped.cu 2025-05-07T19:50:49.2598923Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/bf16i4bf16.cu 2025-05-07T19:50:49.2599815Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/bf16i4bf16_rowwise_batched.cu 2025-05-07T19:50:49.2600971Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/bf16i4bf16_shuffled_grouped.cu 2025-05-07T19:50:49.2601841Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16.cu 2025-05-07T19:50:49.2602738Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_f.cu 2025-05-07T19:50:49.2603732Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_t.cu 2025-05-07T19:50:49.2604749Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_f.cu 2025-05-07T19:50:49.2605735Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_t.cu 2025-05-07T19:50:49.2606732Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_f.cu 2025-05-07T19:50:49.2607719Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_t.cu 2025-05-07T19:50:49.2608688Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_f.cu 2025-05-07T19:50:49.2609678Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_t.cu 2025-05-07T19:50:49.2610773Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_f.cu 2025-05-07T19:50:49.2611769Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_t.cu 2025-05-07T19:50:49.2612760Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_f.cu 2025-05-07T19:50:49.2613740Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_t.cu 2025-05-07T19:50:49.2614738Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_f.cu 2025-05-07T19:50:49.2615841Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_t.cu 2025-05-07T19:50:49.2616785Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_f.cu 2025-05-07T19:50:49.2617755Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_t.cu 2025-05-07T19:50:49.2618704Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_f.cu 2025-05-07T19:50:49.2619948Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_t.cu 2025-05-07T19:50:49.2620978Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_f.cu 2025-05-07T19:50:49.2621961Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_t.cu 2025-05-07T19:50:49.2622954Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_f.cu 2025-05-07T19:50:49.2623932Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_t.cu 2025-05-07T19:50:49.2624913Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_f.cu 2025-05-07T19:50:49.2625997Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_t.cu 2025-05-07T19:50:49.2626865Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16.cu 2025-05-07T19:50:49.2627696Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_blockwise.cu 2025-05-07T19:50:49.2628536Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_cublas.cu 2025-05-07T19:50:49.2629380Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_lite.cu 2025-05-07T19:50:49.2630215Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise.cu 2025-05-07T19:50:49.2631231Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_128_128_128_2_1_1_t_f.cu 2025-05-07T19:50:49.2632427Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_128_256_128_2_1_1_f_t.cu 2025-05-07T19:50:49.2633603Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_128_256_128_4_4_1_f_t.cu 2025-05-07T19:50:49.2634788Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_128_128_1_1_1_f_f.cu 2025-05-07T19:50:49.2636064Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_16_128_1_1_1_f_f.cu 2025-05-07T19:50:49.2637403Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_256_128_1_1_1_f_f.cu 2025-05-07T19:50:49.2638647Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_256_128_2_1_1_f_f.cu 2025-05-07T19:50:49.2639831Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_32_128_2_1_1_f_f.cu 2025-05-07T19:50:49.2640985Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_64_128_2_1_1_f_f.cu 2025-05-07T19:50:49.2642342Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/dispatch_fp8_rowwise_batched_kernel_on_cluster_size_and_transpose.cu 2025-05-07T19:50:49.2643815Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/dispatch_fp8_rowwise_batched_kernel_on_tile_size.cu 2025-05-07T19:50:49.2645066Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/f8f8bf16_rowwise_batched.cu 2025-05-07T19:50:49.2646255Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/f8f8bf16_rowwise_batched_impl.cu 2025-05-07T19:50:49.2647414Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/handle_transposition.cu 2025-05-07T19:50:49.2648448Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise_grouped.cu 2025-05-07T19:50:49.2649355Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_tensorwise.cu 2025-05-07T19:50:49.2650200Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8i4bf16_rowwise.cu 2025-05-07T19:50:49.2651055Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8i4bf16_shuffled.cu 2025-05-07T19:50:49.2652052Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8i4bf16_shuffled_grouped.cu 2025-05-07T19:50:49.2652901Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/i8i8bf16.cu 2025-05-07T19:50:49.2653705Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/i8i8bf16_dynamic.cu 2025-05-07T19:50:49.2655523Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/mixed_dtype_utils.cu 2025-05-07T19:50:49.2656332Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/fast_gemv/bf16_fast_gemv.cu 2025-05-07T19:50:49.2657074Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/fast_gemv/bf16fp8bf16_fast_gemv.cu 2025-05-07T19:50:49.2657860Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/fast_gemv/fp8fp8bf16_fast_gemv.cu 2025-05-07T19:50:49.2658621Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/fast_gemv/include/fast_gemv.cu 2025-05-07T19:50:49.2659454Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/fast_gemv/include/fast_gemv.cuh 2025-05-07T19:50:49.2660411Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/fast_gemv/include/utility.cuh 2025-05-07T19:50:49.2660929Z 2025-05-07T19:50:49.2661159Z HIP_SPECIFIC_SRCS: 2025-05-07T19:50:49.2661563Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/gemm/ck_extensions.hip 2025-05-07T19:50:49.2662173Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/gemm/gemm.cpp 2025-05-07T19:50:49.2662923Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/bf16_grouped_gemm.hip 2025-05-07T19:50:49.2664157Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_128x16x32x128_16x16_1x1_16x8x1_16x8x1_1x16x1x8_4x4x1_1x1_interwave_v2.hip 2025-05-07T19:50:49.2665708Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_128x16x32x64_16x16_1x1_8x16x1_8x16x1_1x16x1x8_2x2x1_1x1_intrawave_v2.hip 2025-05-07T19:50:49.2667295Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_128x16x32x64_16x16_1x1_8x16x1_8x16x1_1x16x1x8_4x4x1_1x1_intrawave_v2.hip 2025-05-07T19:50:49.2668832Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_128x16x64x128_16x16_1x2_16x8x1_16x8x1_1x16x1x8_8x8x1_1x2_interwave_v1.hip 2025-05-07T19:50:49.2670381Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_128x16x64x128_16x16_1x2_16x8x1_16x8x1_1x16x1x8_8x8x1_1x2_interwave_v2.hip 2025-05-07T19:50:49.2671889Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_128x16x64x128_16x16_1x2_16x8x1_16x8x1_1x16x1x8_8x8x1_1x2_intrawave_v2.hip 2025-05-07T19:50:49.2673447Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_128x16x96x128_16x16_1x3_16x8x1_16x8x1_1x16x1x8_4x4x1_1x1_interwave_v1.hip 2025-05-07T19:50:49.2674877Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_128x16x96x128_16x16_1x3_16x8x1_16x8x1_1x16x1x8_4x4x1_1x1_interwave_v2.hip 2025-05-07T19:50:49.2676284Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_128x16x96x128_16x16_1x3_16x8x1_16x8x1_1x16x1x8_4x4x1_1x1_intrawave_v1.hip 2025-05-07T19:50:49.2677701Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_128x16x96x128_16x16_1x3_16x8x1_16x8x1_1x16x1x8_4x4x1_1x1_intrawave_v2.hip 2025-05-07T19:50:49.2679113Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_128x16x96x64_16x16_1x3_8x16x1_8x16x1_1x16x1x8_4x4x1_1x1_intrawave_v1.hip 2025-05-07T19:50:49.2680516Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_128x32x16x64_16x16_1x1_8x16x1_8x16x1_1x16x1x8_2x2x1_1x1_interwave_v2.hip 2025-05-07T19:50:49.2681931Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_128x32x64x128_32x32_1x1_16x8x1_16x8x1_1x16x1x8_8x8x1_1x1_interwave_v2.hip 2025-05-07T19:50:49.2683409Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_128x32x64x128_32x32_1x1_16x8x1_16x8x1_1x16x1x8_8x8x1_1x1_intrawave_v1.hip 2025-05-07T19:50:49.2684817Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_128x32x96x128_16x16_2x3_16x8x1_16x8x1_1x32x1x4_8x8x1_2x1_intrawave_v2.hip 2025-05-07T19:50:49.2686240Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_128x64x128x64_32x32_2x2_8x16x1_8x16x1_1x16x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:50:49.2687669Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_128x64x96x64_16x16_4x3_8x16x1_8x16x1_1x32x1x4_8x8x1_2x1_intrawave_v3.hip 2025-05-07T19:50:49.2689089Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_256x128x128x128_32x32_2x2_16x16x1_16x16x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:50:49.2690533Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_256x128x128x64_32x32_2x2_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_interwave_v1.hip 2025-05-07T19:50:49.2691970Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_256x128x128x64_32x32_2x2_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:50:49.2693382Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_256x128x224x64_16x16_4x7_8x32x1_8x32x1_1x64x1x4_8x8x1_2x1_intrawave_v3.hip 2025-05-07T19:50:49.2694815Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_256x128x256x64_32x32_4x2_8x32x1_8x32x1_1x16x1x16_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:50:49.2696384Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_256x128x96x64_16x16_4x3_8x32x1_8x32x1_1x64x1x4_8x8x1_2x1_intrawave_v3.hip 2025-05-07T19:50:49.2697809Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_256x16x128x128_16x16_1x2_16x16x1_16x16x1_1x16x1x16_8x8x1_1x2_intrawave_v1.hip 2025-05-07T19:50:49.2699256Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_256x16x128x128_16x16_1x2_16x16x1_16x16x1_1x16x1x16_8x8x1_1x2_intrawave_v2.hip 2025-05-07T19:50:49.2701241Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_256x16x64x128_16x16_1x1_16x16x1_16x16x1_1x16x1x16_4x4x1_1x1_interwave_v2.hip 2025-05-07T19:50:49.2702784Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_256x224x256x32_16x16_7x8_8x32x1_8x32x1_1x32x1x8_8x8x1_1x2_intrawave_v3.hip 2025-05-07T19:50:49.2704330Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_256x256x128x32_32x32_4x2_4x64x1_4x64x1_1x32x1x8_8x8x1_1x1_interwave_v1.hip 2025-05-07T19:50:49.2705866Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_256x256x160x64_16x16_8x5_8x32x1_8x32x1_1x64x1x4_8x8x1_2x1_intrawave_v3.hip 2025-05-07T19:50:49.2707392Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_256x256x192x64_32x32_4x3_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:50:49.2708925Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_256x256x224x64_16x16_8x7_8x32x1_8x32x1_1x64x1x4_8x8x1_2x1_intrawave_v3.hip 2025-05-07T19:50:49.2710480Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_256x256x256x64_32x32_4x4_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:50:49.2791298Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_256x32x128x128_16x16_1x4_16x16x1_16x16x1_1x32x1x8_8x8x1_1x2_intrawave_v2.hip 2025-05-07T19:50:49.2793095Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_256x32x224x64_16x16_1x7_8x32x1_8x32x1_1x32x1x8_4x4x1_1x1_intrawave_v1.hip 2025-05-07T19:50:49.2794591Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_256x32x96x64_16x16_1x3_8x32x1_8x32x1_1x32x1x8_4x4x1_1x1_interwave_v1.hip 2025-05-07T19:50:49.2796062Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_256x32x96x64_16x16_1x3_8x32x1_8x32x1_1x32x1x8_4x4x1_1x1_interwave_v2.hip 2025-05-07T19:50:49.2797575Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_256x64x128x128_32x32_2x1_16x16x1_16x16x1_1x16x1x16_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:50:49.2799097Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_256x64x192x128_16x16_4x3_16x16x1_16x16x1_1x32x1x8_8x8x1_2x1_intrawave_v3.hip 2025-05-07T19:50:49.2801017Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_256x64x96x64_16x16_2x3_8x32x1_8x32x1_1x64x1x4_8x8x1_2x1_intrawave_v3.hip 2025-05-07T19:50:49.2802546Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_64x16x16x128_16x16_1x1_16x4x1_16x4x1_1x16x1x4_4x4x1_1x1_interwave_v2.hip 2025-05-07T19:50:49.2804070Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_64x16x16x128_16x16_1x1_16x4x1_16x4x1_1x16x1x4_4x4x1_1x1_intrawave_v1.hip 2025-05-07T19:50:49.2805669Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_64x16x16x64_16x16_1x1_8x8x1_8x8x1_1x16x1x4_4x4x1_1x1_interwave_v2.hip 2025-05-07T19:50:49.2807169Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_64x16x32x128_16x16_1x2_16x4x1_16x4x1_1x16x1x4_8x8x1_1x2_intrawave_v2.hip 2025-05-07T19:50:49.2808681Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_64x16x48x128_16x16_1x3_16x4x1_16x4x1_1x16x1x4_4x4x1_1x1_intrawave_v1.hip 2025-05-07T19:50:49.2810177Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_64x16x64x128_16x16_1x4_16x4x1_16x4x1_1x16x1x4_8x8x1_1x2_intrawave_v2.hip 2025-05-07T19:50:49.2811318Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/ck_utility.hip 2025-05-07T19:50:49.2812117Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_blockwise_gemm.hip 2025-05-07T19:50:49.2812976Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/fp8_rowwise_gemm.hip 2025-05-07T19:50:49.2814253Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_128x128x16x128_16x16_4x1_8x16x1_8x16x1_1x16x1x8_8x8x1_1x1_interwave_v2.hip 2025-05-07T19:50:49.2815995Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_128x128x32x128_32x32_2x1_8x16x1_8x16x1_1x16x1x8_4x4x1_1x1_intrawave_v2.hip 2025-05-07T19:50:49.2817447Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_128x16x32x128_16x16_1x1_8x16x1_8x16x1_1x16x1x8_4x4x1_1x1_interwave_v2.hip 2025-05-07T19:50:49.2818942Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_128x16x32x128_16x16_1x1_8x16x1_8x16x1_1x16x1x8_4x4x1_1x1_interwave_v2_4_split_k.hip 2025-05-07T19:50:49.2820844Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_128x16x32x128_16x16_1x1_8x16x1_8x16x1_1x16x1x8_4x4x1_1x1_interwave_v2_8_split_k.hip 2025-05-07T19:50:49.2822390Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_128x16x32x128_16x16_1x1_8x16x1_8x16x1_1x16x1x8_4x4x1_1x1_intrawave_v1.hip 2025-05-07T19:50:49.2823885Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_128x16x32x128_16x16_1x1_8x16x1_8x16x1_1x16x1x8_4x4x1_1x1_intrawave_v2.hip 2025-05-07T19:50:49.2825407Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_128x16x32x128_16x16_1x1_8x16x1_8x16x1_1x16x1x8_4x4x1_1x1_intrawave_v2_8_split_k.hip 2025-05-07T19:50:49.2826940Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_128x16x32x256_16x16_1x1_16x8x1_16x8x1_1x16x1x8_4x4x1_1x1_intrawave_v1.hip 2025-05-07T19:50:49.2828532Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_128x16x32x512_16x16_1x1_32x4x1_32x4x1_1x16x1x8_4x4x1_1x1_interwave_v2.hip 2025-05-07T19:50:49.2830045Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_128x16x32x512_16x16_1x1_32x4x1_32x4x1_1x16x1x8_4x4x1_1x1_interwave_v2_2_split_k.hip 2025-05-07T19:50:49.2831564Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_128x16x32x512_16x16_1x1_32x4x1_32x4x1_1x16x1x8_4x4x1_1x1_intrawave_v2.hip 2025-05-07T19:50:49.2833086Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_128x16x32x512_16x16_1x1_32x4x1_32x4x1_1x16x1x8_4x4x1_1x1_intrawave_v2_2_split_k.hip 2025-05-07T19:50:49.2834688Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_128x16x32x512_16x16_1x1_8x16x1_8x16x1_1x16x1x8_4x4x1_1x1_interwave_v2.hip 2025-05-07T19:50:49.2836188Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_128x32x128x128_32x32_1x2_8x16x1_8x16x1_1x16x1x8_8x8x1_1x1_interwave_v2.hip 2025-05-07T19:50:49.2837682Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_128x32x16x128_16x16_1x1_8x16x1_8x16x1_1x16x1x8_2x2x1_1x1_interwave_v1.hip 2025-05-07T19:50:49.2839160Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_128x32x16x128_16x16_1x1_8x16x1_8x16x1_1x16x1x8_2x2x1_1x1_interwave_v2.hip 2025-05-07T19:50:49.2840736Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_128x32x16x256_16x16_1x1_16x8x1_16x8x1_1x32x1x4_4x4x1_1x1_interwave_v1.hip 2025-05-07T19:50:49.2842129Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_128x32x16x512_16x16_1x1_32x4x1_32x4x1_1x32x1x4_4x4x1_1x1_interwave_v2.hip 2025-05-07T19:50:49.2843493Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_128x32x16x512_16x16_1x1_32x4x1_32x4x1_1x32x1x4_4x4x1_1x1_intrawave_v2.hip 2025-05-07T19:50:49.2844869Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_128x32x64x128_32x32_1x1_8x16x1_8x16x1_1x16x1x8_8x8x1_1x1_interwave_v2.hip 2025-05-07T19:50:49.2846247Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_128x32x64x128_32x32_1x1_8x16x1_8x16x1_1x16x1x8_8x8x1_1x1_intrawave_v2.hip 2025-05-07T19:50:49.2847616Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_128x64x32x128_32x32_1x1_8x16x1_8x16x1_1x16x1x8_4x4x1_1x1_interwave_v2.hip 2025-05-07T19:50:49.2849045Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_128x64x32x128_32x32_1x1_8x16x1_8x16x1_1x16x1x8_4x4x1_1x1_intrawave_v2.hip 2025-05-07T19:50:49.2850551Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x128x128x128_16x16_4x4_8x32x1_8x32x1_1x32x1x8_8x8x1_1x2_intrawave_v3.hip 2025-05-07T19:50:49.2851938Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x128x128x128_32x32_2x2_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_interwave_v1.hip 2025-05-07T19:50:49.2853329Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x128x128x128_32x32_2x2_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:50:49.2854731Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x128x128x128_32x32_2x2_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v5.hip 2025-05-07T19:50:49.2856133Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x128x128x256_32x32_2x2_16x16x1_16x16x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:50:49.2857529Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x128x128x64_32x32_2x2_4x64x1_4x64x1_1x32x1x8_8x8x1_1x1_intrawave_v4.hip 2025-05-07T19:50:49.2858930Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x128x160x128_16x16_4x5_8x32x1_8x32x1_1x64x1x4_8x8x1_2x1_intrawave_v3.hip 2025-05-07T19:50:49.2860618Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x128x160x128_32x32_1x5_8x32x1_8x32x1_1x64x1x4_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:50:49.2862222Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x128x192x128_32x32_2x3_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:50:49.2863737Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x128x256x128_32x32_2x4_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:50:49.2865237Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x128x64x128_32x32_2x1_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:50:49.2866755Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x128x64x256_32x32_2x1_16x16x1_16x16x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:50:49.2868267Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x128x96x128_16x16_4x3_8x32x1_8x32x1_1x64x1x4_8x8x1_2x1_intrawave_v3.hip 2025-05-07T19:50:49.2869776Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x128x96x256_32x32_1x3_16x16x1_16x16x1_1x64x1x4_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:50:49.2871305Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x160x128x128_16x16_5x4_8x32x1_8x32x1_1x32x1x8_8x8x1_1x2_intrawave_v3.hip 2025-05-07T19:50:49.2872872Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x160x256x128_16x16_5x8_8x32x1_8x32x1_1x32x1x8_8x8x1_1x2_intrawave_v3.hip 2025-05-07T19:50:49.2874254Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x160x96x128_16x16_5x3_8x32x1_8x32x1_1x32x1x8_4x4x1_1x1_intrawave_v3.hip 2025-05-07T19:50:49.2875690Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x16x64x128_16x16_1x1_16x16x1_8x32x1_1x16x1x16_4x4x1_1x1_intrawave_v2_8_split_k.hip 2025-05-07T19:50:49.2877183Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x16x64x512_16x16_1x1_32x8x1_32x8x1_1x16x1x16_4x4x1_1x1_intrawave_v2.hip 2025-05-07T19:50:49.2878562Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x16x64x512_16x16_1x1_32x8x1_32x8x1_1x16x1x16_4x4x1_1x1_intrawave_v3.hip 2025-05-07T19:50:49.2879948Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x192x128x128_16x16_6x4_8x32x1_8x32x1_1x32x1x8_8x8x1_2x2_intrawave_v3.hip 2025-05-07T19:50:49.2881337Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x192x192x128_16x16_6x6_8x32x1_8x32x1_1x32x1x8_8x8x1_1x2_intrawave_v3.hip 2025-05-07T19:50:49.2882733Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x192x224x128_16x16_6x7_8x32x1_8x32x1_1x64x1x4_8x8x1_2x1_intrawave_v3.hip 2025-05-07T19:50:49.2884133Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x192x256x128_16x16_6x8_8x32x1_8x32x1_1x32x1x8_8x8x1_1x2_intrawave_v3.hip 2025-05-07T19:50:49.2885523Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x192x256x128_16x16_6x8_8x32x1_8x32x1_1x32x1x8_8x8x1_2x2_intrawave_v3.hip 2025-05-07T19:50:49.2886921Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x224x160x128_16x16_7x5_8x32x1_8x32x1_1x32x1x8_4x4x1_1x1_intrawave_v3.hip 2025-05-07T19:50:49.2888316Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x224x192x128_16x16_7x6_8x32x1_8x32x1_1x32x1x8_8x8x1_1x2_intrawave_v3.hip 2025-05-07T19:50:49.2889763Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x224x256x128_16x16_7x8_8x32x1_8x32x1_1x32x1x8_8x8x1_1x2_intrawave_v3.hip 2025-05-07T19:50:49.2891162Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x256x128x128_16x16_8x4_8x32x1_8x32x1_1x32x1x8_8x8x1_1x2_intrawave_v3.hip 2025-05-07T19:50:49.2892561Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x256x128x128_32x32_4x2_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:50:49.2893950Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x256x160x128_16x16_8x5_8x32x1_8x32x1_1x64x1x4_8x8x1_2x1_intrawave_v3.hip 2025-05-07T19:50:49.2895351Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x256x192x128_16x16_8x6_8x32x1_8x32x1_1x32x1x8_8x8x1_1x2_intrawave_v3.hip 2025-05-07T19:50:49.2896753Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x256x192x128_32x32_4x3_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:50:49.2898137Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x256x224x128_16x16_8x7_8x32x1_8x32x1_1x64x1x4_8x8x1_2x1_intrawave_v3.hip 2025-05-07T19:50:49.2899583Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x256x256x128_16x16_8x8_8x32x1_8x32x1_1x32x1x8_8x8x1_1x2_intrawave_v3.hip 2025-05-07T19:50:49.2901619Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x256x256x64_16x16_8x8_4x64x1_4x64x1_1x32x1x8_8x8x1_1x2_intrawave_v3.hip 2025-05-07T19:50:49.2903110Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x256x256x64_32x32_4x4_4x64x1_4x64x1_1x32x1x8_8x8x1_1x1_intrawave_v4.hip 2025-05-07T19:50:49.2904612Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x256x96x128_16x16_8x3_8x32x1_8x32x1_1x64x1x4_8x8x1_2x1_intrawave_v3.hip 2025-05-07T19:50:49.2906249Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x256x96x128_32x32_2x3_8x32x1_8x32x1_1x64x1x4_8x8x1_2x1_intrawave_v3.hip 2025-05-07T19:50:49.2907745Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x32x128x256_32x32_1x1_16x16x1_16x16x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:50:49.2909257Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x32x64x512_16x16_1x2_32x8x1_32x8x1_1x32x1x8_8x8x1_1x2_intrawave_v3.hip 2025-05-07T19:50:49.2910757Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x64x128x128_32x32_1x2_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:50:49.2912263Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x64x128x256_32x32_1x2_16x16x1_16x16x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:50:49.2913776Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x64x16x512_16x16_1x1_32x8x1_32x8x1_1x64x1x4_4x4x1_1x1_intrawave_v2.hip 2025-05-07T19:50:49.2915151Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x64x192x128_32x32_1x3_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:50:49.2916695Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x64x192x256_32x32_1x3_16x16x1_16x16x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:50:49.2918165Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x64x256x128_32x32_1x4_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:50:49.2919553Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x64x64x128_32x32_1x1_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:50:49.2920919Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x64x64x512_32x32_1x1_32x8x1_32x8x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:50:49.2922303Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x64x96x256_16x16_2x3_16x16x1_16x16x1_1x64x1x4_8x8x1_2x1_intrawave_v3.hip 2025-05-07T19:50:49.2923700Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x80x128x256_16x16_5x2_16x16x1_16x16x1_1x16x1x16_8x8x1_1x2_intrawave_v3.hip 2025-05-07T19:50:49.2925094Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x96x128x128_16x16_3x4_8x32x1_8x32x1_1x32x1x8_8x8x1_1x2_intrawave_v3.hip 2025-05-07T19:50:49.2926472Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_64x16x16x128_16x16_1x1_8x8x1_8x8x1_1x16x1x4_4x4x1_1x1_interwave_v1.hip 2025-05-07T19:50:49.2927854Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_64x16x16x128_16x16_1x1_8x8x1_8x8x1_1x16x1x4_4x4x1_1x1_interwave_v2.hip 2025-05-07T19:50:49.2929212Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_64x16x16x256_16x16_1x1_16x4x1_16x4x1_1x16x1x4_4x4x1_1x1_intrawave_v1.hip 2025-05-07T19:50:49.2930579Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_64x16x16x256_16x16_1x1_16x4x1_16x4x1_1x4x1x16_4x4x1_1x1_intrawave_v1.hip 2025-05-07T19:50:49.2931943Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_64x16x16x512_16x16_1x1_32x2x1_32x2x1_1x16x1x4_4x4x1_1x1_interwave_v2.hip 2025-05-07T19:50:49.2933367Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_64x16x16x512_16x16_1x1_8x8x1_8x8x1_1x16x1x4_4x4x1_1x1_interwave_v2.hip 2025-05-07T19:50:49.2934734Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_64x16x16x64_16x16_1x1_4x16x1_4x16x1_1x16x1x4_4x4x1_1x1_interwave_v2.hip 2025-05-07T19:50:49.2935899Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/fp8_rowwise_batched_gemm.hip 2025-05-07T19:50:49.2937146Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_128x16x32x256_16x16_1x1_16x8x1_16x8x1_1x16x1x8_4x4x1_1x1_interwave_v1.hip 2025-05-07T19:50:49.2938658Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_128x16x32x256_16x16_1x1_8x16x1_8x16x1_1x16x1x8_4x4x1_1x1_interwave_v1.hip 2025-05-07T19:50:49.2940431Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_128x16x32x256_16x16_1x1_8x16x1_8x16x1_1x16x1x8_4x4x1_1x1_interwave_v2.hip 2025-05-07T19:50:49.2942066Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_128x16x32x256_16x16_1x1_8x16x1_8x16x1_1x16x1x8_4x4x1_1x1_intrawave_v1.hip 2025-05-07T19:50:49.2943686Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_128x16x32x256_16x16_1x1_8x16x1_8x16x1_1x16x1x8_4x4x1_1x1_intrawave_v2.hip 2025-05-07T19:50:49.2945307Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_128x16x32x512_16x16_1x1_32x4x1_32x4x1_1x16x1x8_4x4x1_1x1_intrawave_v2.hip 2025-05-07T19:50:49.2946987Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_128x16x32x512_16x16_1x1_8x16x1_8x16x1_1x16x1x8_4x4x1_1x1_intrawave_v1.hip 2025-05-07T19:50:49.2948599Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_128x16x32x512_16x16_1x1_8x16x1_8x16x1_1x16x1x8_4x4x1_1x1_intrawave_v2.hip 2025-05-07T19:50:49.2950209Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_128x32x128x128_32x32_1x2_8x16x1_8x16x1_1x16x1x8_8x8x1_1x1_interwave_v2.hip 2025-05-07T19:50:49.2951831Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_128x32x64x128_32x32_1x1_8x16x1_8x16x1_1x16x1x8_8x8x1_1x1_intrawave_v2.hip 2025-05-07T19:50:49.2953464Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_256x128x128x128_32x32_2x2_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:50:49.2954967Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_256x128x128x128_32x32_2x2_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v4.hip 2025-05-07T19:50:49.2956481Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_256x128x128x128_32x32_2x2_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v5.hip 2025-05-07T19:50:49.2957999Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_256x128x128x256_32x32_2x2_16x16x1_16x16x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:50:49.2959501Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_256x128x160x128_32x32_1x5_8x32x1_8x32x1_1x64x1x4_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:50:49.2961058Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_256x128x192x128_32x32_2x3_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:50:49.2962569Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_256x128x256x128_32x32_2x4_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:50:49.2964056Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_256x128x64x128_32x32_2x1_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:50:49.2965572Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_256x128x96x256_32x32_1x3_16x16x1_16x16x1_1x64x1x4_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:50:49.2967079Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_256x16x64x512_16x16_1x1_32x8x1_32x8x1_1x16x1x16_4x4x1_1x1_intrawave_v3.hip 2025-05-07T19:50:49.2968584Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_256x224x256x128_16x16_7x8_8x32x1_8x32x1_1x32x1x8_8x8x1_1x2_intrawave_v3.hip 2025-05-07T19:50:49.2970090Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_256x256x128x128_16x16_8x4_8x32x1_8x32x1_1x32x1x8_8x8x1_1x2_intrawave_v3.hip 2025-05-07T19:50:49.2971604Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_256x256x160x128_16x16_8x5_8x32x1_8x32x1_1x64x1x4_8x8x1_2x1_intrawave_v3.hip 2025-05-07T19:50:49.2973157Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_256x256x192x128_16x16_8x6_8x32x1_8x32x1_1x32x1x8_8x8x1_1x2_intrawave_v3.hip 2025-05-07T19:50:49.2974673Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_256x256x224x128_16x16_8x7_8x32x1_8x32x1_1x64x1x4_8x8x1_2x1_intrawave_v3.hip 2025-05-07T19:50:49.2976183Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_256x256x256x128_16x16_8x8_8x32x1_8x32x1_1x32x1x8_8x8x1_1x2_intrawave_v3.hip 2025-05-07T19:50:49.2977677Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_256x32x128x256_32x32_1x1_16x16x1_16x16x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:50:49.2979180Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_256x32x64x512_16x16_1x2_32x8x1_32x8x1_1x32x1x8_8x8x1_1x2_intrawave_v3.hip 2025-05-07T19:50:49.2981002Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_256x64x128x256_32x32_1x2_16x16x1_16x16x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:50:49.2982642Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_256x64x192x256_32x32_1x3_16x16x1_16x16x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:50:49.2984268Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_256x64x64x128_32x32_1x1_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:50:49.2985896Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_256x64x64x512_32x32_1x1_32x8x1_32x8x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:50:49.2987506Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_64x16x16x512_16x16_1x1_32x2x1_32x2x1_1x16x1x4_4x4x1_1x1_interwave_v2.hip 2025-05-07T19:50:49.2989197Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_64x16x16x512_16x16_1x1_32x2x1_32x2x1_1x16x1x4_4x4x1_1x1_intrawave_v2.hip 2025-05-07T19:50:49.2990808Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_64x16x16x512_16x16_1x1_8x8x1_8x8x1_1x16x1x4_4_1x1_interwave_v1.hip 2025-05-07T19:50:49.2992463Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_64x16x16x512_16x16_1x1_8x8x1_8x8x1_1x16x1x4_4_1x1_interwave_v2.hip 2025-05-07T19:50:49.2993676Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/fp8_rowwise_grouped_gemm.hip 2025-05-07T19:50:49.2994915Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_128x16x32x128_16x16_1x1_8x16x1_8x16x1_1x16x1x8_2x2x1_1x1_intrawave_v2.hip 2025-05-07T19:50:49.2996409Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_128x16x32x128_16x16_1x1_8x16x1_8x16x1_1x16x1x8_4x4x1_1x1_intrawave_v2.hip 2025-05-07T19:50:49.2997901Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_128x16x32x256_16x16_1x1_16x8x1_16x8x1_1x16x1x8_4x4x1_1x1_interwave_v2.hip 2025-05-07T19:50:49.2999407Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_128x16x32x512_16x16_1x1_32x4x1_32x4x1_1x16x1x8_4x4x1_1x1_interwave_v2.hip 2025-05-07T19:50:49.3001244Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_128x16x32x512_16x16_1x1_32x4x1_32x4x1_1x16x1x8_4x4x1_1x1_intrawave_v1.hip 2025-05-07T19:50:49.3002990Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_128x16x32x512_16x16_1x1_32x4x1_32x4x1_1x16x1x8_4x4x1_1x1_intrawave_v2.hip 2025-05-07T19:50:49.3004606Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_128x16x64x256_16x16_1x2_16x8x1_16x8x1_1x16x1x8_8x8x1_1x2_interwave_v1.hip 2025-05-07T19:50:49.3006219Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_128x16x64x256_16x16_1x2_16x8x1_16x8x1_1x16x1x8_8x8x1_1x2_interwave_v2.hip 2025-05-07T19:50:49.3007842Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_128x16x64x256_16x16_1x2_16x8x1_16x8x1_1x16x1x8_8x8x1_1x2_intrawave_v1.hip 2025-05-07T19:50:49.3009467Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_128x16x96x256_16x16_1x3_16x8x1_16x8x1_1x16x1x8_4x4x1_1x1_intrawave_v1.hip 2025-05-07T19:50:49.3011079Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_128x32x16x128_16x16_1x1_8x16x1_8x16x1_1x16x1x8_2x2x1_1x1_interwave_v2.hip 2025-05-07T19:50:49.3012717Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_128x32x64x256_16x16_1x4_16x8x1_16x8x1_1x32x1x4_8x8x1_1x2_intrawave_v1.hip 2025-05-07T19:50:49.3014339Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_128x32x64x256_32x32_1x1_16x8x1_16x8x1_1x16x1x8_8x8x1_1x1_interwave_v1.hip 2025-05-07T19:50:49.3015990Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_128x64x64x256_32x32_1x2_16x8x1_16x8x1_1x16x1x8_8x8x1_1x2_intrawave_v3.hip 2025-05-07T19:50:49.3017569Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_128x64x64x256_32x32_2x1_16x8x1_16x8x1_1x16x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:50:49.3019084Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x128x128x128_32x32_2x2_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_interwave_v1.hip 2025-05-07T19:50:49.3020896Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x128x128x128_32x32_2x2_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:50:49.3022546Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x128x128x256_32x32_2x2_16x16x1_16x16x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:50:49.3024199Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x128x224x128_16x16_4x7_8x32x1_8x32x1_1x64x1x4_8x8x1_2x1_intrawave_v3.hip 2025-05-07T19:50:49.3025840Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x128x256x128_32x32_4x2_8x32x1_8x32x1_1x16x1x16_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:50:49.3027470Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x128x96x128_16x16_4x3_8x32x1_8x32x1_1x64x1x4_8x8x1_2x1_intrawave_v3.hip 2025-05-07T19:50:49.3029114Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x16x128x256_16x16_1x2_16x16x1_16x16x1_1x16x1x16_8x8x1_1x2_intrawave_v1.hip 2025-05-07T19:50:49.3030751Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x16x128x256_16x16_1x2_16x16x1_16x16x1_1x16x1x16_8x8x1_1x2_intrawave_v2.hip 2025-05-07T19:50:49.3032570Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x16x128x256_16x16_1x2_16x16x1_16x16x1_1x16x1x16_8x8x1_1x2_intrawave_v3.hip 2025-05-07T19:50:49.3034160Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x16x64x256_16x16_1x1_16x16x1_16x16x1_1x16x1x16_4x4x1_1x1_interwave_v2.hip 2025-05-07T19:50:49.3035769Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x16x64x256_16x16_1x1_16x16x1_16x16x1_1x16x1x16_4x4x1_1x1_intrawave_v1.hip 2025-05-07T19:50:49.3037580Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x16x64x256_16x16_1x1_16x16x1_16x16x1_1x16x1x16_4x4x1_1x1_intrawave_v2.hip 2025-05-07T19:50:49.3039230Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x16x64x512_16x16_1x1_32x8x1_32x8x1_1x16x1x16_4x4x1_1x1_intrawave_v1.hip 2025-05-07T19:50:49.3040859Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x192x96x128_16x16_6x3_8x32x1_8x32x1_1x64x1x4_8x8x1_2x1_intrawave_v3.hip 2025-05-07T19:50:49.3042502Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x224x256x128_16x16_7x8_8x32x1_8x32x1_1x32x1x8_8x8x1_1x2_intrawave_v3.hip 2025-05-07T19:50:49.3044131Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x256x128x64_32x32_4x2_4x64x1_4x64x1_1x32x1x8_8x8x1_1x1_interwave_v1.hip 2025-05-07T19:50:49.3045770Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x256x160x128_32x32_2x5_8x32x1_8x32x1_1x64x1x4_8x8x1_2x1_intrawave_v3.hip 2025-05-07T19:50:49.3047458Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x256x192x128_32x32_4x3_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:50:49.3049179Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x256x224x128_16x16_8x7_8x32x1_8x32x1_1x64x1x4_8x8x1_2x1_intrawave_v3.hip 2025-05-07T19:50:49.3050764Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x256x256x128_32x32_4x4_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:50:49.3052389Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x256x256x128_32x32_8x2_8x32x1_8x32x1_1x16x1x16_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:50:49.3053904Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x32x128x128_16x16_1x4_8x32x1_8x32x1_1x32x1x8_8x8x1_1x2_interwave_v2.hip 2025-05-07T19:50:49.3055598Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x32x160x128_16x16_1x5_8x32x1_8x32x1_1x32x1x8_4x4x1_1x1_interwave_v2.hip 2025-05-07T19:50:49.3057188Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x32x160x128_16x16_1x5_8x32x1_8x32x1_1x32x1x8_4x4x1_1x1_intrawave_v1.hip 2025-05-07T19:50:49.3058773Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x32x256x128_16x16_1x8_8x32x1_8x32x1_1x32x1x8_8x8x1_1x2_intrawave_v1.hip 2025-05-07T19:50:49.3060646Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x32x32x512_16x16_1x1_32x8x1_32x8x1_1x32x1x8_4x4x1_1x1_interwave_v2.hip 2025-05-07T19:50:49.3062276Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x32x32x512_16x16_1x1_32x8x1_32x8x1_1x32x1x8_4x4x1_1x1_intrawave_v2.hip 2025-05-07T19:50:49.3063895Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x32x64x512_16x16_2x1_32x8x1_32x8x1_1x32x1x8_8x8x1_2x1_intrawave_v2.hip 2025-05-07T19:50:49.3065513Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x64x128x256_32x32_1x2_16x16x1_16x16x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:50:49.3067162Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x64x128x256_32x32_2x1_16x16x1_16x16x1_1x16x1x16_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:50:49.3068802Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x64x160x128_16x16_2x5_8x32x1_8x32x1_1x64x1x4_8x8x1_2x1_intrawave_v3.hip 2025-05-07T19:50:49.3070438Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x64x192x128_16x16_4x3_8x32x1_8x32x1_1x32x1x8_8x8x1_2x1_intrawave_v3.hip 2025-05-07T19:50:49.3072163Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_64x16x16x128_16x16_1x1_8x8x1_8x8x1_1x16x1x4_4x4x1_1x1_interwave_v2.hip 2025-05-07T19:50:49.3073642Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_64x16x16x256_16x16_1x1_16x4x1_16x4x1_1x16x1x4_4x4x1_1x1_interwave_v2.hip 2025-05-07T19:50:49.3075137Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_64x16x16x256_16x16_1x1_16x4x1_16x4x1_1x16x1x4_4x4x1_1x1_intrawave_v1.hip 2025-05-07T19:50:49.3076676Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_64x16x32x256_16x16_1x2_16x4x1_16x4x1_1x16x1x4_8x8x1_1x2_intrawave_v1.hip 2025-05-07T19:50:49.3078164Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_64x16x64x256_16x16_1x4_16x4x1_16x4x1_1x16x1x4_8x8x1_1x2_interwave_v1.hip 2025-05-07T19:50:49.3079644Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_64x16x64x256_16x16_1x4_16x4x1_16x4x1_1x16x1x4_8x8x1_1x2_intrawave_v1.hip 2025-05-07T19:50:49.3080759Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_tensorwise_gemm.hip 2025-05-07T19:50:49.3081262Z 2025-05-07T19:50:49.3081430Z OTHER_SRCS: 2025-05-07T19:50:49.3081565Z 2025-05-07T19:50:49.3081642Z 2025-05-07T19:50:49.3081807Z CC_FLAGS: 2025-05-07T19:50:49.3081911Z 2025-05-07T19:50:49.3081976Z 2025-05-07T19:50:49.3082133Z NVCC_FLAGS: 2025-05-07T19:50:49.3082236Z 2025-05-07T19:50:49.3082300Z 2025-05-07T19:50:49.3082463Z HIPCC_FLAGS: 2025-05-07T19:50:49.3082573Z 2025-05-07T19:50:49.3082637Z 2025-05-07T19:50:49.3082806Z INCLUDE_DIRS: 2025-05-07T19:50:49.3083034Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../include 2025-05-07T19:50:49.3083355Z /__w/FBGEMM/FBGEMM/fbgemm_gpu 2025-05-07T19:50:49.3083632Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/include 2025-05-07T19:50:49.3083941Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../include 2025-05-07T19:50:49.3084417Z /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include 2025-05-07T19:50:49.3085172Z /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include 2025-05-07T19:50:49.3086607Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src 2025-05-07T19:50:49.3086981Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include 2025-05-07T19:50:49.3087375Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include 2025-05-07T19:50:49.3087805Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include 2025-05-07T19:50:49.3088286Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include 2025-05-07T19:50:49.3088718Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include 2025-05-07T19:50:49.3089241Z /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include 2025-05-07T19:50:49.3089811Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize 2025-05-07T19:50:49.3090144Z 2025-05-07T19:50:49.3090320Z Selected Source Files: 2025-05-07T19:50:49.3090687Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/attention/attention.cpp 2025-05-07T19:50:49.3091248Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/coalesce/coalesce.cpp 2025-05-07T19:50:49.3091793Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/quantize.cpp 2025-05-07T19:50:49.3092314Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/comm/car.cpp 2025-05-07T19:50:49.3092872Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/gather_scatter/gather_scatter.cpp 2025-05-07T19:50:49.3093467Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/moe/index_shuffling.cpp 2025-05-07T19:50:49.3094018Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/kv_cache/kv_cache.cpp 2025-05-07T19:50:49.3094583Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/attention/gqa_attn_splitk.cu 2025-05-07T19:50:49.3095161Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/coalesce/coalesce.cu 2025-05-07T19:50:49.3095705Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/quantize.cu 2025-05-07T19:50:49.3096207Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/comm/car.cu 2025-05-07T19:50:49.3096749Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/gather_scatter/gather_scatter.cu 2025-05-07T19:50:49.3097337Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/moe/index_shuffling.cu 2025-05-07T19:50:49.3097887Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/kv_cache/kv_cache.cu 2025-05-07T19:50:49.3098629Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/bf16bf16bf16_grouped.cu 2025-05-07T19:50:49.3099504Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/bf16i4bf16.cu 2025-05-07T19:50:49.3100688Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/bf16i4bf16_rowwise_batched.cu 2025-05-07T19:50:49.3101613Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/bf16i4bf16_shuffled_grouped.cu 2025-05-07T19:50:49.3102462Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16.cu 2025-05-07T19:50:49.3103317Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_f.cu 2025-05-07T19:50:49.3104293Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_t.cu 2025-05-07T19:50:49.3105260Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_f.cu 2025-05-07T19:50:49.3106212Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_t.cu 2025-05-07T19:50:49.3107176Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_f.cu 2025-05-07T19:50:49.3108121Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_t.cu 2025-05-07T19:50:49.3109079Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_f.cu 2025-05-07T19:50:49.3110032Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_t.cu 2025-05-07T19:50:49.3111122Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_f.cu 2025-05-07T19:50:49.3112080Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_t.cu 2025-05-07T19:50:49.3113130Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_f.cu 2025-05-07T19:50:49.3114164Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_t.cu 2025-05-07T19:50:49.3115053Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_f.cu 2025-05-07T19:50:49.3116095Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_t.cu 2025-05-07T19:50:49.3116983Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_f.cu 2025-05-07T19:50:49.3117876Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_t.cu 2025-05-07T19:50:49.3118753Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_f.cu 2025-05-07T19:50:49.3119642Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_t.cu 2025-05-07T19:50:49.3120515Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_f.cu 2025-05-07T19:50:49.3121598Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_t.cu 2025-05-07T19:50:49.3122534Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_f.cu 2025-05-07T19:50:49.3123456Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_t.cu 2025-05-07T19:50:49.3124393Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_f.cu 2025-05-07T19:50:49.3125390Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_t.cu 2025-05-07T19:50:49.3126222Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16.cu 2025-05-07T19:50:49.3127006Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_blockwise.cu 2025-05-07T19:50:49.3127805Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_cublas.cu 2025-05-07T19:50:49.3128595Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_lite.cu 2025-05-07T19:50:49.3129369Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise.cu 2025-05-07T19:50:49.3130342Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_128_128_128_2_1_1_t_f.cu 2025-05-07T19:50:49.3131476Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_128_256_128_2_1_1_f_t.cu 2025-05-07T19:50:49.3132586Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_128_256_128_4_4_1_f_t.cu 2025-05-07T19:50:49.3133797Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_128_128_1_1_1_f_f.cu 2025-05-07T19:50:49.3134857Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_16_128_1_1_1_f_f.cu 2025-05-07T19:50:49.3135913Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_256_128_1_1_1_f_f.cu 2025-05-07T19:50:49.3137034Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_256_128_2_1_1_f_f.cu 2025-05-07T19:50:49.3138086Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_32_128_2_1_1_f_f.cu 2025-05-07T19:50:49.3139148Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_64_128_2_1_1_f_f.cu 2025-05-07T19:50:49.3140729Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/dispatch_fp8_rowwise_batched_kernel_on_cluster_size_and_transpose.cu 2025-05-07T19:50:49.3142151Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/dispatch_fp8_rowwise_batched_kernel_on_tile_size.cu 2025-05-07T19:50:49.3143394Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/f8f8bf16_rowwise_batched.cu 2025-05-07T19:50:49.3144553Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/f8f8bf16_rowwise_batched_impl.cu 2025-05-07T19:50:49.3145682Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/handle_transposition.cu 2025-05-07T19:50:49.3146688Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise_grouped.cu 2025-05-07T19:50:49.3147558Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_tensorwise.cu 2025-05-07T19:50:49.3148399Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8i4bf16_rowwise.cu 2025-05-07T19:50:49.3149226Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8i4bf16_shuffled.cu 2025-05-07T19:50:49.3150092Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8i4bf16_shuffled_grouped.cu 2025-05-07T19:50:49.3150924Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/i8i8bf16.cu 2025-05-07T19:50:49.3151702Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/i8i8bf16_dynamic.cu 2025-05-07T19:50:49.3152683Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/mixed_dtype_utils.cu 2025-05-07T19:50:49.3153441Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/fast_gemv/bf16_fast_gemv.cu 2025-05-07T19:50:49.3154160Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/fast_gemv/bf16fp8bf16_fast_gemv.cu 2025-05-07T19:50:49.3154917Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/fast_gemv/fp8fp8bf16_fast_gemv.cu 2025-05-07T19:50:49.3155646Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/fast_gemv/include/fast_gemv.cu 2025-05-07T19:50:49.3156378Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/fast_gemv/include/fast_gemv.cuh 2025-05-07T19:50:49.3157105Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/fast_gemv/include/utility.cuh 2025-05-07T19:50:49.3157589Z 2025-05-07T19:50:49.3157783Z HIPified Source Files: 2025-05-07T19:50:49.3157936Z 2025-05-07T19:50:49.3158005Z 2025-05-07T19:50:49.3158198Z Library Dependencies: 2025-05-07T19:50:49.3158410Z torch 2025-05-07T19:50:49.3158602Z torch_library 2025-05-07T19:50:49.3159026Z /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/lib/libc10.so 2025-05-07T19:50:49.3159710Z /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/libnvrtc.so 2025-05-07T19:50:49.3160407Z /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/lib/libc10_cuda.so 2025-05-07T19:50:49.3161189Z /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib/libnccl.so.2 2025-05-07T19:50:49.3161916Z /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/stubs/libcuda.so 2025-05-07T19:50:49.3162504Z /github/home/miniconda/envs/build_binary/lib/stubs/libnvidia-ml.so 2025-05-07T19:50:49.3162946Z 2025-05-07T19:50:49.3163120Z Output Library: 2025-05-07T19:50:49.3163350Z fbgemm_gpu_experimental_gen_ai 2025-05-07T19:50:49.3163593Z 2025-05-07T19:50:49.3163892Z Destination Directory: 2025-05-07T19:50:49.3164036Z 2025-05-07T19:50:49.3164153Z ================================================================================ 2025-05-07T19:50:49.3164366Z 2025-05-07T19:50:49.3164370Z 2025-05-07T19:50:49.3164373Z 2025-05-07T19:50:49.3164471Z ================================================================================ 2025-05-07T19:50:49.3164816Z Adding to Package: fbgemm_gpu/experimental/gen_ai 2025-05-07T19:50:49.3165112Z 2025-05-07T19:50:49.3165281Z TARGETS: 2025-05-07T19:50:49.3165478Z fbgemm_gpu_experimental_gen_ai 2025-05-07T19:50:49.3165718Z 2025-05-07T19:50:49.3165882Z FILES: 2025-05-07T19:50:49.3165978Z 2025-05-07T19:50:49.3166080Z ================================================================================ 2025-05-07T19:50:49.3166295Z 2025-05-07T19:50:49.3166299Z 2025-05-07T19:50:49.3166308Z 2025-05-07T19:50:49.3166406Z ================================================================================ 2025-05-07T19:50:49.3166793Z GPU CPP Library Target: fbgemm_gpu_experimental_example_py (SHARED) 2025-05-07T19:50:49.3167150Z 2025-05-07T19:50:49.3167309Z CPU_SRCS: 2025-05-07T19:50:49.3167418Z 2025-05-07T19:50:49.3167483Z 2025-05-07T19:50:49.3167652Z GPU_SRCS: 2025-05-07T19:50:49.3167962Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/example/src/example_nccl.cpp 2025-05-07T19:50:49.3168485Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/example/src/example_ops.cpp 2025-05-07T19:50:49.3169007Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/example/src/cutlass_sgemm_nn.cu 2025-05-07T19:50:49.3169401Z 2025-05-07T19:50:49.3169573Z CUDA_SPECIFIC_SRCS: 2025-05-07T19:50:49.3169709Z 2025-05-07T19:50:49.3169776Z 2025-05-07T19:50:49.3169945Z HIP_SPECIFIC_SRCS: 2025-05-07T19:50:49.3170078Z 2025-05-07T19:50:49.3170147Z 2025-05-07T19:50:49.3170321Z OTHER_SRCS: 2025-05-07T19:50:49.3170428Z 2025-05-07T19:50:49.3170495Z 2025-05-07T19:50:49.3170661Z CC_FLAGS: 2025-05-07T19:50:49.3170764Z 2025-05-07T19:50:49.3170834Z 2025-05-07T19:50:49.3171001Z NVCC_FLAGS: 2025-05-07T19:50:49.3171105Z 2025-05-07T19:50:49.3171176Z 2025-05-07T19:50:49.3171344Z HIPCC_FLAGS: 2025-05-07T19:50:49.3171515Z 2025-05-07T19:50:49.3171583Z 2025-05-07T19:50:49.3171755Z INCLUDE_DIRS: 2025-05-07T19:50:49.3171964Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../include 2025-05-07T19:50:49.3172258Z /__w/FBGEMM/FBGEMM/fbgemm_gpu 2025-05-07T19:50:49.3172523Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/include 2025-05-07T19:50:49.3172803Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../include 2025-05-07T19:50:49.3173266Z /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include 2025-05-07T19:50:49.3174181Z /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include 2025-05-07T19:50:49.3174823Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src 2025-05-07T19:50:49.3175225Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include 2025-05-07T19:50:49.3175651Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include 2025-05-07T19:50:49.3176118Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include 2025-05-07T19:50:49.3176627Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include 2025-05-07T19:50:49.3177077Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include 2025-05-07T19:50:49.3177622Z /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include 2025-05-07T19:50:49.3178108Z 2025-05-07T19:50:49.3178292Z Selected Source Files: 2025-05-07T19:50:49.3178670Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/example/src/example_nccl.cpp 2025-05-07T19:50:49.3179211Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/example/src/example_ops.cpp 2025-05-07T19:50:49.3180037Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/example/src/cutlass_sgemm_nn.cu 2025-05-07T19:50:49.3180464Z 2025-05-07T19:50:49.3180742Z HIPified Source Files: 2025-05-07T19:50:49.3180894Z 2025-05-07T19:50:49.3180982Z 2025-05-07T19:50:49.3181165Z Library Dependencies: 2025-05-07T19:50:49.3181392Z torch 2025-05-07T19:50:49.3181570Z torch_library 2025-05-07T19:50:49.3182009Z /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/lib/libc10.so 2025-05-07T19:50:49.3182691Z /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/libnvrtc.so 2025-05-07T19:50:49.3183392Z /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/lib/libc10_cuda.so 2025-05-07T19:50:49.3184198Z /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib/libnccl.so.2 2025-05-07T19:50:49.3184937Z /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/stubs/libcuda.so 2025-05-07T19:50:49.3185552Z /github/home/miniconda/envs/build_binary/lib/stubs/libnvidia-ml.so 2025-05-07T19:50:49.3185948Z 2025-05-07T19:50:49.3186137Z Output Library: 2025-05-07T19:50:49.3186376Z fbgemm_gpu_experimental_example_py 2025-05-07T19:50:49.3186651Z 2025-05-07T19:50:49.3186841Z Destination Directory: 2025-05-07T19:50:49.3187002Z 2025-05-07T19:50:49.3187110Z ================================================================================ 2025-05-07T19:50:49.3187338Z 2025-05-07T19:50:49.3187342Z 2025-05-07T19:50:49.3187345Z 2025-05-07T19:50:49.3187469Z ================================================================================ 2025-05-07T19:50:49.3187837Z Adding to Package: fbgemm_gpu/experimental/example 2025-05-07T19:50:49.3188172Z 2025-05-07T19:50:49.3188346Z TARGETS: 2025-05-07T19:50:49.3188568Z fbgemm_gpu_experimental_example_py 2025-05-07T19:50:49.3188856Z 2025-05-07T19:50:49.3189022Z FILES: 2025-05-07T19:50:49.3189359Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/example/example/__init__.py 2025-05-07T19:50:49.3189898Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/example/example/utils.py 2025-05-07T19:50:49.3190336Z ================================================================================ 2025-05-07T19:50:49.3190570Z 2025-05-07T19:50:49.3190573Z 2025-05-07T19:50:49.3190578Z 2025-05-07T19:50:49.3190693Z ================================================================================ 2025-05-07T19:50:49.3191094Z Adding to Package: fbgemm_gpu/experimental/gemm/triton_gemm 2025-05-07T19:50:49.3191457Z 2025-05-07T19:50:49.3191700Z TARGETS: 2025-05-07T19:50:49.3191814Z 2025-05-07T19:50:49.3191884Z 2025-05-07T19:50:49.3192175Z FILES: 2025-05-07T19:50:49.3192476Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gemm/triton_gemm/__init__.py 2025-05-07T19:50:49.3192998Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gemm/triton_gemm/fp8_gemm.py 2025-05-07T19:50:49.3193527Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gemm/triton_gemm/grouped_gemm.py 2025-05-07T19:50:49.3194102Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gemm/triton_gemm/matmul_perf_model.py 2025-05-07T19:50:49.3194634Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gemm/triton_gemm/utils.py 2025-05-07T19:50:49.3195036Z ================================================================================ 2025-05-07T19:50:49.3195250Z 2025-05-07T19:50:49.3195356Z -- Configuring done (7.7s) 2025-05-07T19:50:49.3195623Z -- Generating done (0.0s) 2025-05-07T19:50:49.3196085Z -- Build files have been written to: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.13/cmake-build 2025-05-07T19:50:49.3196728Z Change Dir: '/__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.13/cmake-build' 2025-05-07T19:50:49.3197074Z 2025-05-07T19:50:49.3197343Z Run Build Command(s): /github/home/miniconda/envs/build_binary/bin/ninja -v -j 48 install 2025-05-07T19:50:49.4555149Z [1/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/errorhandler.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/errorhandler.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/errorhandler.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/errorhandler.cpp 2025-05-07T19:50:49.4682253Z [2/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/support.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/support.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/support.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/support.cpp 2025-05-07T19:50:49.4768996Z [3/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/globals.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/globals.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/globals.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/globals.cpp 2025-05-07T19:50:49.4793272Z [4/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zonetree.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zonetree.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zonetree.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zonetree.cpp 2025-05-07T19:50:49.4897342Z [5/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zonelist.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zonelist.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zonelist.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zonelist.cpp 2025-05-07T19:50:49.4973918Z [6/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/osutils.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/osutils.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/osutils.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/osutils.cpp 2025-05-07T19:50:49.5057833Z [7/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/operand.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/operand.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/operand.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/operand.cpp 2025-05-07T19:50:49.5114759Z [8/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zonestack.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zonestack.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zonestack.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zonestack.cpp 2025-05-07T19:50:49.5169545Z [9/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zonehash.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zonehash.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zonehash.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zonehash.cpp 2025-05-07T19:50:49.5322578Z [10/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/environment.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/environment.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/environment.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/environment.cpp 2025-05-07T19:50:49.5464869Z [11/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zone.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zone.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zone.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zone.cpp 2025-05-07T19:50:49.5606342Z [12/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/type.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/type.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/type.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/type.cpp 2025-05-07T19:50:49.5715608Z [13/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/inst.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/inst.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/inst.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/inst.cpp 2025-05-07T19:50:49.5776238Z [14/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/target.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/target.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/target.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/target.cpp 2025-05-07T19:50:49.5926696Z [15/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/string.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/string.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/string.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/string.cpp 2025-05-07T19:50:49.6000617Z [16/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64operand.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64operand.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64operand.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64operand.cpp 2025-05-07T19:50:49.6011432Z In file included from /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64operand.cpp:10: 2025-05-07T19:50:49.6013296Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecB8() const': 2025-05-07T19:50:49.6016652Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64operand.h:132:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:49.6020957Z 132 | ASMJIT_INLINE_NODEBUG constexpr bool isVecB8() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementB); } 2025-05-07T19:50:49.6022961Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:49.6024416Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecH4() const': 2025-05-07T19:50:49.6027631Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64operand.h:133:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:49.6031433Z 133 | ASMJIT_INLINE_NODEBUG constexpr bool isVecH4() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementH); } 2025-05-07T19:50:49.6033466Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:49.6034961Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecS2() const': 2025-05-07T19:50:49.6038292Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64operand.h:134:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:49.6042489Z 134 | ASMJIT_INLINE_NODEBUG constexpr bool isVecS2() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementS); } 2025-05-07T19:50:49.6044619Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:49.6046271Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecD1() const': 2025-05-07T19:50:49.6049852Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64operand.h:135:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:49.6053782Z 135 | ASMJIT_INLINE_NODEBUG constexpr bool isVecD1() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature); } 2025-05-07T19:50:49.6055773Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:49.6057432Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecB16() const': 2025-05-07T19:50:49.6060834Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64operand.h:137:112: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:49.6064621Z 137 | ASMJIT_INLINE_NODEBUG constexpr bool isVecB16() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementB); } 2025-05-07T19:50:49.6066654Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:49.6068411Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecH8() const': 2025-05-07T19:50:49.6071743Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64operand.h:138:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:49.6075564Z 138 | ASMJIT_INLINE_NODEBUG constexpr bool isVecH8() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementH); } 2025-05-07T19:50:49.6077469Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:49.6078981Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecS4() const': 2025-05-07T19:50:49.6082242Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64operand.h:139:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:49.6085985Z 139 | ASMJIT_INLINE_NODEBUG constexpr bool isVecS4() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementS); } 2025-05-07T19:50:49.6087986Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:49.6089762Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecD2() const': 2025-05-07T19:50:49.6093020Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64operand.h:140:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:49.6096921Z 140 | ASMJIT_INLINE_NODEBUG constexpr bool isVecD2() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementD); } 2025-05-07T19:50:49.6098954Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:49.6100940Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecB4x4() const': 2025-05-07T19:50:49.6104336Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64operand.h:141:113: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:49.6108241Z 141 | ASMJIT_INLINE_NODEBUG constexpr bool isVecB4x4() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementB4); } 2025-05-07T19:50:49.6110219Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:49.6111829Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecH2x4() const': 2025-05-07T19:50:49.6115422Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64operand.h:142:113: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:49.6119259Z 142 | ASMJIT_INLINE_NODEBUG constexpr bool isVecH2x4() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementH2); } 2025-05-07T19:50:49.6121325Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:49.6121997Z At global scope: 2025-05-07T19:50:49.6123266Z cc1plus: note: unrecognized command-line option '-Wno-deprecated-anon-enum-enum-conversion' may have been intended to silence earlier diagnostics 2025-05-07T19:50:49.6152483Z [17/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/constpool.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/constpool.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/constpool.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/constpool.cpp 2025-05-07T19:50:49.6171701Z [18/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/instdb.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/instdb.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/instdb.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/instdb.cpp 2025-05-07T19:50:49.6224902Z [19/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/jitruntime.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/jitruntime.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/jitruntime.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/jitruntime.cpp 2025-05-07T19:50:49.6396461Z [20/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zonevector.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zonevector.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zonevector.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zonevector.cpp 2025-05-07T19:50:49.6629702Z [21/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64func.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64func.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64func.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64func.cpp 2025-05-07T19:50:49.6641403Z In file included from /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64func.cpp:10: 2025-05-07T19:50:49.6643327Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecB8() const': 2025-05-07T19:50:49.6646576Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64operand.h:132:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:49.6650424Z 132 | ASMJIT_INLINE_NODEBUG constexpr bool isVecB8() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementB); } 2025-05-07T19:50:49.6652462Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:49.6654046Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecH4() const': 2025-05-07T19:50:49.6657299Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64operand.h:133:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:49.6661531Z 133 | ASMJIT_INLINE_NODEBUG constexpr bool isVecH4() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementH); } 2025-05-07T19:50:49.6663566Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:49.6665172Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecS2() const': 2025-05-07T19:50:49.6668495Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64operand.h:134:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:49.6672324Z 134 | ASMJIT_INLINE_NODEBUG constexpr bool isVecS2() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementS); } 2025-05-07T19:50:49.6674333Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:49.6675905Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecD1() const': 2025-05-07T19:50:49.6679292Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64operand.h:135:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:49.6683151Z 135 | ASMJIT_INLINE_NODEBUG constexpr bool isVecD1() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature); } 2025-05-07T19:50:49.6684991Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:49.6686599Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecB16() const': 2025-05-07T19:50:49.6689974Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64operand.h:137:112: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:49.6693759Z 137 | ASMJIT_INLINE_NODEBUG constexpr bool isVecB16() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementB); } 2025-05-07T19:50:49.6695780Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:49.6697342Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecH8() const': 2025-05-07T19:50:49.6701029Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64operand.h:138:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:49.6704888Z 138 | ASMJIT_INLINE_NODEBUG constexpr bool isVecH8() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementH); } 2025-05-07T19:50:49.6706889Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:49.6708731Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecS4() const': 2025-05-07T19:50:49.6711971Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64operand.h:139:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:49.6715768Z 139 | ASMJIT_INLINE_NODEBUG constexpr bool isVecS4() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementS); } 2025-05-07T19:50:49.6717868Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:49.6719487Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecD2() const': 2025-05-07T19:50:49.6723018Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64operand.h:140:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:49.6727122Z 140 | ASMJIT_INLINE_NODEBUG constexpr bool isVecD2() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementD); } 2025-05-07T19:50:49.6729278Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:49.6731204Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecB4x4() const': 2025-05-07T19:50:49.6734744Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64operand.h:141:113: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:49.6738241Z 141 | ASMJIT_INLINE_NODEBUG constexpr bool isVecB4x4() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementB4); } 2025-05-07T19:50:49.6740312Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:49.6741880Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecH2x4() const': 2025-05-07T19:50:49.6745235Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64operand.h:142:113: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:49.6749113Z 142 | ASMJIT_INLINE_NODEBUG constexpr bool isVecH2x4() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementH2); } 2025-05-07T19:50:49.6751195Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:49.6751826Z At global scope: 2025-05-07T19:50:49.6753106Z cc1plus: note: unrecognized command-line option '-Wno-deprecated-anon-enum-enum-conversion' may have been intended to silence earlier diagnostics 2025-05-07T19:50:49.6763842Z [22/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/logger.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/logger.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/logger.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/logger.cpp 2025-05-07T19:50:49.6783182Z [23/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/rastack.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/rastack.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/rastack.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/rastack.cpp 2025-05-07T19:50:49.6806071Z [24/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/codewriter.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/codewriter.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/codewriter.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/codewriter.cpp 2025-05-07T19:50:49.6825829Z [25/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/func.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/func.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/func.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/func.cpp 2025-05-07T19:50:49.6844929Z [26/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64instdb.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64instdb.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64instdb.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64instdb.cpp 2025-05-07T19:50:49.6855629Z In file included from /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64instdb_p.h:12, 2025-05-07T19:50:49.6856896Z from /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64instdb.cpp:11: 2025-05-07T19:50:49.6858968Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecB8() const': 2025-05-07T19:50:49.6862547Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/a64operand.h:132:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:49.6866459Z 132 | ASMJIT_INLINE_NODEBUG constexpr bool isVecB8() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementB); } 2025-05-07T19:50:49.6868503Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:49.6870100Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecH4() const': 2025-05-07T19:50:49.6873522Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/a64operand.h:133:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:49.6877450Z 133 | ASMJIT_INLINE_NODEBUG constexpr bool isVecH4() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementH); } 2025-05-07T19:50:49.6879501Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:49.6881122Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecS2() const': 2025-05-07T19:50:49.6884769Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/a64operand.h:134:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:49.6888661Z 134 | ASMJIT_INLINE_NODEBUG constexpr bool isVecS2() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementS); } 2025-05-07T19:50:49.6890721Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:49.6892340Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecD1() const': 2025-05-07T19:50:49.6895638Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/a64operand.h:135:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:49.6899568Z 135 | ASMJIT_INLINE_NODEBUG constexpr bool isVecD1() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature); } 2025-05-07T19:50:49.6901666Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:49.6903289Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecB16() const': 2025-05-07T19:50:49.6906714Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/a64operand.h:137:112: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:49.6910908Z 137 | ASMJIT_INLINE_NODEBUG constexpr bool isVecB16() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementB); } 2025-05-07T19:50:49.6912970Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:49.6914561Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecH8() const': 2025-05-07T19:50:49.6917971Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/a64operand.h:138:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:49.6921908Z 138 | ASMJIT_INLINE_NODEBUG constexpr bool isVecH8() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementH); } 2025-05-07T19:50:49.6923932Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:49.6925532Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecS4() const': 2025-05-07T19:50:49.6928978Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/a64operand.h:139:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:49.6932868Z 139 | ASMJIT_INLINE_NODEBUG constexpr bool isVecS4() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementS); } 2025-05-07T19:50:49.6935151Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:49.6936765Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecD2() const': 2025-05-07T19:50:49.6940305Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/a64operand.h:140:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:49.6944220Z 140 | ASMJIT_INLINE_NODEBUG constexpr bool isVecD2() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementD); } 2025-05-07T19:50:49.6946284Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:49.6947892Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecB4x4() const': 2025-05-07T19:50:49.6951349Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/a64operand.h:141:113: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:49.6955267Z 141 | ASMJIT_INLINE_NODEBUG constexpr bool isVecB4x4() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementB4); } 2025-05-07T19:50:49.6957503Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:49.6959130Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecH2x4() const': 2025-05-07T19:50:49.6962474Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/a64operand.h:142:113: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:49.6966422Z 142 | ASMJIT_INLINE_NODEBUG constexpr bool isVecH2x4() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementH2); } 2025-05-07T19:50:49.6968490Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:49.6969143Z At global scope: 2025-05-07T19:50:49.6970418Z cc1plus: note: unrecognized command-line option '-Wno-deprecated-anon-enum-enum-conversion' may have been intended to silence earlier diagnostics 2025-05-07T19:50:49.7008004Z [27/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64instapi.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64instapi.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64instapi.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64instapi.cpp 2025-05-07T19:50:49.7018749Z In file included from /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64instdb_p.h:12, 2025-05-07T19:50:49.7020102Z from /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64instapi.cpp:13: 2025-05-07T19:50:49.7021905Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecB8() const': 2025-05-07T19:50:49.7025299Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/a64operand.h:132:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:49.7029161Z 132 | ASMJIT_INLINE_NODEBUG constexpr bool isVecB8() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementB); } 2025-05-07T19:50:49.7031189Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:49.7032757Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecH4() const': 2025-05-07T19:50:49.7036118Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/a64operand.h:133:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:49.7040169Z 133 | ASMJIT_INLINE_NODEBUG constexpr bool isVecH4() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementH); } 2025-05-07T19:50:49.7042180Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:49.7043750Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecS2() const': 2025-05-07T19:50:49.7047143Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/a64operand.h:134:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:49.7050963Z 134 | ASMJIT_INLINE_NODEBUG constexpr bool isVecS2() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementS); } 2025-05-07T19:50:49.7052947Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:49.7054514Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecD1() const': 2025-05-07T19:50:49.7057853Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/a64operand.h:135:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:49.7061614Z 135 | ASMJIT_INLINE_NODEBUG constexpr bool isVecD1() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature); } 2025-05-07T19:50:49.7063611Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:49.7065199Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecB16() const': 2025-05-07T19:50:49.7068454Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/a64operand.h:137:112: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:49.7072283Z 137 | ASMJIT_INLINE_NODEBUG constexpr bool isVecB16() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementB); } 2025-05-07T19:50:49.7074279Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:49.7075846Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecH8() const': 2025-05-07T19:50:49.7079213Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/a64operand.h:138:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:49.7083026Z 138 | ASMJIT_INLINE_NODEBUG constexpr bool isVecH8() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementH); } 2025-05-07T19:50:49.7085157Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:49.7086748Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecS4() const': 2025-05-07T19:50:49.7090105Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/a64operand.h:139:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:49.7093888Z 139 | ASMJIT_INLINE_NODEBUG constexpr bool isVecS4() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementS); } 2025-05-07T19:50:49.7095923Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:49.7097497Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecD2() const': 2025-05-07T19:50:49.7101160Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/a64operand.h:140:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:49.7104982Z 140 | ASMJIT_INLINE_NODEBUG constexpr bool isVecD2() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementD); } 2025-05-07T19:50:49.7107017Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:49.7108786Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecB4x4() const': 2025-05-07T19:50:49.7112156Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/a64operand.h:141:113: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:49.7115971Z 141 | ASMJIT_INLINE_NODEBUG constexpr bool isVecB4x4() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementB4); } 2025-05-07T19:50:49.7118028Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:49.7119610Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecH2x4() const': 2025-05-07T19:50:49.7122945Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/a64operand.h:142:113: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:49.7126778Z 142 | ASMJIT_INLINE_NODEBUG constexpr bool isVecH2x4() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementH2); } 2025-05-07T19:50:49.7128797Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:49.7129596Z At global scope: 2025-05-07T19:50:49.7130851Z cc1plus: note: unrecognized command-line option '-Wno-deprecated-anon-enum-enum-conversion' may have been intended to silence earlier diagnostics 2025-05-07T19:50:49.7141907Z [28/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/funcargscontext.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/funcargscontext.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/funcargscontext.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/funcargscontext.cpp 2025-05-07T19:50:49.7161682Z [29/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64formatter.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64formatter.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64formatter.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64formatter.cpp 2025-05-07T19:50:49.7172256Z In file included from /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64instdb_p.h:12, 2025-05-07T19:50:49.7173506Z from /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64formatter.cpp:13: 2025-05-07T19:50:49.7175283Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecB8() const': 2025-05-07T19:50:49.7178635Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/a64operand.h:132:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:49.7182547Z 132 | ASMJIT_INLINE_NODEBUG constexpr bool isVecB8() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementB); } 2025-05-07T19:50:49.7184572Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:49.7186133Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecH4() const': 2025-05-07T19:50:49.7189456Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/a64operand.h:133:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:49.7193405Z 133 | ASMJIT_INLINE_NODEBUG constexpr bool isVecH4() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementH); } 2025-05-07T19:50:49.7195425Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:49.7196980Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecS2() const': 2025-05-07T19:50:49.7200578Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/a64operand.h:134:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:49.7204412Z 134 | ASMJIT_INLINE_NODEBUG constexpr bool isVecS2() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementS); } 2025-05-07T19:50:49.7206410Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:49.7207984Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecD1() const': 2025-05-07T19:50:49.7211333Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/a64operand.h:135:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:49.7214958Z 135 | ASMJIT_INLINE_NODEBUG constexpr bool isVecD1() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature); } 2025-05-07T19:50:49.7217021Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:49.7218609Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecB16() const': 2025-05-07T19:50:49.7222043Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/a64operand.h:137:112: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:49.7225823Z 137 | ASMJIT_INLINE_NODEBUG constexpr bool isVecB16() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementB); } 2025-05-07T19:50:49.7227869Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:49.7229439Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecH8() const': 2025-05-07T19:50:49.7232775Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/a64operand.h:138:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:49.7236565Z 138 | ASMJIT_INLINE_NODEBUG constexpr bool isVecH8() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementH); } 2025-05-07T19:50:49.7238771Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:49.7240331Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecS4() const': 2025-05-07T19:50:49.7243679Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/a64operand.h:139:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:49.7247475Z 139 | ASMJIT_INLINE_NODEBUG constexpr bool isVecS4() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementS); } 2025-05-07T19:50:49.7249485Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:49.7251056Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecD2() const': 2025-05-07T19:50:49.7254391Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/a64operand.h:140:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:49.7258134Z 140 | ASMJIT_INLINE_NODEBUG constexpr bool isVecD2() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementD); } 2025-05-07T19:50:49.7260291Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:49.7261879Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecB4x4() const': 2025-05-07T19:50:49.7265373Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/a64operand.h:141:113: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:49.7269210Z 141 | ASMJIT_INLINE_NODEBUG constexpr bool isVecB4x4() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementB4); } 2025-05-07T19:50:49.7271244Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:49.7272832Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecH2x4() const': 2025-05-07T19:50:49.7276178Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/a64operand.h:142:113: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:49.7280005Z 142 | ASMJIT_INLINE_NODEBUG constexpr bool isVecH2x4() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementH2); } 2025-05-07T19:50:49.7282039Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:49.7282795Z At global scope: 2025-05-07T19:50:49.7284044Z cc1plus: note: unrecognized command-line option '-Wno-deprecated-anon-enum-enum-conversion' may have been intended to silence earlier diagnostics 2025-05-07T19:50:49.7294797Z [30/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/emitterutils.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/emitterutils.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/emitterutils.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/emitterutils.cpp 2025-05-07T19:50:49.7314708Z [31/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/emitter.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/emitter.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/emitter.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/emitter.cpp 2025-05-07T19:50:49.7656032Z [32/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/emithelper.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/emithelper.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/emithelper.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/emithelper.cpp 2025-05-07T19:50:49.7675527Z [33/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/archtraits.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/archtraits.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/archtraits.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/archtraits.cpp 2025-05-07T19:50:49.7686405Z In file included from /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/../arm/a64archtraits_p.h:13, 2025-05-07T19:50:49.7687743Z from /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/archtraits.cpp:16: 2025-05-07T19:50:49.7689567Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecB8() const': 2025-05-07T19:50:49.7693059Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/../arm/../arm/a64operand.h:132:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:49.7696945Z 132 | ASMJIT_INLINE_NODEBUG constexpr bool isVecB8() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementB); } 2025-05-07T19:50:49.7698960Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:49.7700916Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecH4() const': 2025-05-07T19:50:49.7704628Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/../arm/../arm/a64operand.h:133:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:49.7708515Z 133 | ASMJIT_INLINE_NODEBUG constexpr bool isVecH4() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementH); } 2025-05-07T19:50:49.7710562Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:49.7712165Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecS2() const': 2025-05-07T19:50:49.7715598Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/../arm/../arm/a64operand.h:134:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:49.7719353Z 134 | ASMJIT_INLINE_NODEBUG constexpr bool isVecS2() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementS); } 2025-05-07T19:50:49.7721370Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:49.7722948Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecD1() const': 2025-05-07T19:50:49.7726638Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/../arm/../arm/a64operand.h:135:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:49.7730362Z 135 | ASMJIT_INLINE_NODEBUG constexpr bool isVecD1() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature); } 2025-05-07T19:50:49.7732244Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:49.7733842Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecB16() const': 2025-05-07T19:50:49.7737290Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/../arm/../arm/a64operand.h:137:112: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:49.7741325Z 137 | ASMJIT_INLINE_NODEBUG constexpr bool isVecB16() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementB); } 2025-05-07T19:50:49.7743362Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:49.7744966Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecH8() const': 2025-05-07T19:50:49.7748414Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/../arm/../arm/a64operand.h:138:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:49.7752475Z 138 | ASMJIT_INLINE_NODEBUG constexpr bool isVecH8() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementH); } 2025-05-07T19:50:49.7754507Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:49.7756109Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecS4() const': 2025-05-07T19:50:49.7759527Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/../arm/../arm/a64operand.h:139:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:49.7763417Z 139 | ASMJIT_INLINE_NODEBUG constexpr bool isVecS4() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementS); } 2025-05-07T19:50:49.7765462Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:49.7767039Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecD2() const': 2025-05-07T19:50:49.7770462Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/../arm/../arm/a64operand.h:140:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:49.7774473Z 140 | ASMJIT_INLINE_NODEBUG constexpr bool isVecD2() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementD); } 2025-05-07T19:50:49.7776504Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:49.7778104Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecB4x4() const': 2025-05-07T19:50:49.7781659Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/../arm/../arm/a64operand.h:141:113: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:49.7785503Z 141 | ASMJIT_INLINE_NODEBUG constexpr bool isVecB4x4() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementB4); } 2025-05-07T19:50:49.7787535Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:49.7789144Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecH2x4() const': 2025-05-07T19:50:49.7792573Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/../arm/../arm/a64operand.h:142:113: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:49.7796423Z 142 | ASMJIT_INLINE_NODEBUG constexpr bool isVecH2x4() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementH2); } 2025-05-07T19:50:49.7798457Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:49.7799137Z At global scope: 2025-05-07T19:50:49.7802919Z cc1plus: note: unrecognized command-line option '-Wno-deprecated-anon-enum-enum-conversion' may have been intended to silence earlier diagnostics 2025-05-07T19:50:49.7812115Z [34/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/assembler.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/assembler.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/assembler.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/assembler.cpp 2025-05-07T19:50:49.7846941Z [35/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/armformatter.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/armformatter.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/armformatter.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/armformatter.cpp 2025-05-07T19:50:49.7858158Z In file included from /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/armformatter.cpp:12: 2025-05-07T19:50:49.7860188Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecB8() const': 2025-05-07T19:50:49.7863572Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64operand.h:132:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:49.7867523Z 132 | ASMJIT_INLINE_NODEBUG constexpr bool isVecB8() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementB); } 2025-05-07T19:50:49.7869606Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:49.7871185Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecH4() const': 2025-05-07T19:50:49.7874570Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64operand.h:133:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:49.7878597Z 133 | ASMJIT_INLINE_NODEBUG constexpr bool isVecH4() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementH); } 2025-05-07T19:50:49.7880686Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:49.7882230Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecS2() const': 2025-05-07T19:50:49.7885607Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64operand.h:134:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:49.7889483Z 134 | ASMJIT_INLINE_NODEBUG constexpr bool isVecS2() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementS); } 2025-05-07T19:50:49.7891532Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:49.7893119Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecD1() const': 2025-05-07T19:50:49.7896373Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64operand.h:135:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:49.7900574Z 135 | ASMJIT_INLINE_NODEBUG constexpr bool isVecD1() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature); } 2025-05-07T19:50:49.7902493Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:49.7904088Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecB16() const': 2025-05-07T19:50:49.7907467Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64operand.h:137:112: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:49.7911382Z 137 | ASMJIT_INLINE_NODEBUG constexpr bool isVecB16() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementB); } 2025-05-07T19:50:49.7913485Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:49.7915054Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecH8() const': 2025-05-07T19:50:49.7918397Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64operand.h:138:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:49.7922303Z 138 | ASMJIT_INLINE_NODEBUG constexpr bool isVecH8() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementH); } 2025-05-07T19:50:49.7924375Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:49.7926165Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecS4() const': 2025-05-07T19:50:49.7929550Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64operand.h:139:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:49.7933426Z 139 | ASMJIT_INLINE_NODEBUG constexpr bool isVecS4() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementS); } 2025-05-07T19:50:49.7935477Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:49.7937051Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecD2() const': 2025-05-07T19:50:49.7940507Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64operand.h:140:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:49.7944344Z 140 | ASMJIT_INLINE_NODEBUG constexpr bool isVecD2() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementD); } 2025-05-07T19:50:49.7946399Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:49.7948186Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecB4x4() const': 2025-05-07T19:50:49.7951513Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64operand.h:141:113: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:49.7955378Z 141 | ASMJIT_INLINE_NODEBUG constexpr bool isVecB4x4() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementB4); } 2025-05-07T19:50:49.7957455Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:49.7959025Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecH2x4() const': 2025-05-07T19:50:49.7962401Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64operand.h:142:113: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:49.7966302Z 142 | ASMJIT_INLINE_NODEBUG constexpr bool isVecH2x4() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementH2); } 2025-05-07T19:50:49.7968426Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:49.7969080Z At global scope: 2025-05-07T19:50:49.7970394Z cc1plus: note: unrecognized command-line option '-Wno-deprecated-anon-enum-enum-conversion' may have been intended to silence earlier diagnostics 2025-05-07T19:50:49.8000043Z [36/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86operand.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86operand.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86operand.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86operand.cpp 2025-05-07T19:50:49.8178722Z [37/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64builder.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64builder.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64builder.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64builder.cpp 2025-05-07T19:50:49.8189931Z In file included from /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/a64emitter.h:12, 2025-05-07T19:50:49.8191280Z from /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64assembler.h:10, 2025-05-07T19:50:49.8192406Z from /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64builder.cpp:9: 2025-05-07T19:50:49.8194220Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecB8() const': 2025-05-07T19:50:49.8197674Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h:132:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:49.8201821Z 132 | ASMJIT_INLINE_NODEBUG constexpr bool isVecB8() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementB); } 2025-05-07T19:50:49.8203841Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:49.8205459Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecH4() const': 2025-05-07T19:50:49.8208863Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h:133:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:49.8212910Z 133 | ASMJIT_INLINE_NODEBUG constexpr bool isVecH4() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementH); } 2025-05-07T19:50:49.8214931Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:49.8216591Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecS2() const': 2025-05-07T19:50:49.8220150Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h:134:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:49.8223993Z 134 | ASMJIT_INLINE_NODEBUG constexpr bool isVecS2() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementS); } 2025-05-07T19:50:49.8225977Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:49.8227641Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecD1() const': 2025-05-07T19:50:49.8231073Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h:135:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:49.8234979Z 135 | ASMJIT_INLINE_NODEBUG constexpr bool isVecD1() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature); } 2025-05-07T19:50:49.8236865Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:49.8238522Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecB16() const': 2025-05-07T19:50:49.8241836Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h:137:112: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:49.8245788Z 137 | ASMJIT_INLINE_NODEBUG constexpr bool isVecB16() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementB); } 2025-05-07T19:50:49.8247884Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:49.8249452Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecH8() const': 2025-05-07T19:50:49.8252815Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h:138:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:49.8256678Z 138 | ASMJIT_INLINE_NODEBUG constexpr bool isVecH8() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementH); } 2025-05-07T19:50:49.8258858Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:49.8260569Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecS4() const': 2025-05-07T19:50:49.8263994Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h:139:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:49.8267809Z 139 | ASMJIT_INLINE_NODEBUG constexpr bool isVecS4() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementS); } 2025-05-07T19:50:49.8269841Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:49.8271488Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecD2() const': 2025-05-07T19:50:49.8274829Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h:140:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:49.8278629Z 140 | ASMJIT_INLINE_NODEBUG constexpr bool isVecD2() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementD); } 2025-05-07T19:50:49.8280873Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:49.8282558Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecB4x4() const': 2025-05-07T19:50:49.8285949Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h:141:113: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:49.8289740Z 141 | ASMJIT_INLINE_NODEBUG constexpr bool isVecB4x4() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementB4); } 2025-05-07T19:50:49.8291790Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:49.8293447Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecH2x4() const': 2025-05-07T19:50:49.8296852Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h:142:113: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:49.8300748Z 142 | ASMJIT_INLINE_NODEBUG constexpr bool isVecH2x4() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementH2); } 2025-05-07T19:50:49.8302478Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:49.8303001Z At global scope: 2025-05-07T19:50:49.8328639Z cc1plus: note: unrecognized command-line option '-Wno-deprecated-anon-enum-enum-conversion' may have been intended to silence earlier diagnostics 2025-05-07T19:50:49.8943232Z [38/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64compiler.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64compiler.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64compiler.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64compiler.cpp 2025-05-07T19:50:49.8954619Z In file included from /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/a64emitter.h:12, 2025-05-07T19:50:49.8956017Z from /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64assembler.h:10, 2025-05-07T19:50:49.8957211Z from /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64compiler.cpp:9: 2025-05-07T19:50:49.8959086Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecB8() const': 2025-05-07T19:50:49.8962947Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h:132:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:49.8967025Z 132 | ASMJIT_INLINE_NODEBUG constexpr bool isVecB8() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementB); } 2025-05-07T19:50:49.8969123Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:49.8970792Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecH4() const': 2025-05-07T19:50:49.8974189Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h:133:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:49.8978211Z 133 | ASMJIT_INLINE_NODEBUG constexpr bool isVecH4() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementH); } 2025-05-07T19:50:49.8980372Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:49.8982001Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecS2() const': 2025-05-07T19:50:49.8985650Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h:134:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:49.8989912Z 134 | ASMJIT_INLINE_NODEBUG constexpr bool isVecS2() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementS); } 2025-05-07T19:50:49.8991935Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:49.8993600Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecD1() const': 2025-05-07T19:50:49.8997187Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h:135:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:49.9001295Z 135 | ASMJIT_INLINE_NODEBUG constexpr bool isVecD1() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature); } 2025-05-07T19:50:49.9003225Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:49.9004937Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecB16() const': 2025-05-07T19:50:49.9008369Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h:137:112: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:49.9012765Z 137 | ASMJIT_INLINE_NODEBUG constexpr bool isVecB16() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementB); } 2025-05-07T19:50:49.9014919Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:49.9016551Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecH8() const': 2025-05-07T19:50:49.9019826Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h:138:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:49.9023463Z 138 | ASMJIT_INLINE_NODEBUG constexpr bool isVecH8() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementH); } 2025-05-07T19:50:49.9025380Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:49.9026886Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecS4() const': 2025-05-07T19:50:49.9030248Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h:139:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:49.9033977Z 139 | ASMJIT_INLINE_NODEBUG constexpr bool isVecS4() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementS); } 2025-05-07T19:50:49.9036186Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:49.9037763Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecD2() const': 2025-05-07T19:50:49.9041115Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h:140:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:49.9044857Z 140 | ASMJIT_INLINE_NODEBUG constexpr bool isVecD2() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementD); } 2025-05-07T19:50:49.9046824Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:49.9048327Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecB4x4() const': 2025-05-07T19:50:49.9051797Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h:141:113: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:49.9055730Z 141 | ASMJIT_INLINE_NODEBUG constexpr bool isVecB4x4() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementB4); } 2025-05-07T19:50:49.9058036Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:49.9059848Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecH2x4() const': 2025-05-07T19:50:49.9063536Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h:142:113: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:49.9067548Z 142 | ASMJIT_INLINE_NODEBUG constexpr bool isVecH2x4() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementH2); } 2025-05-07T19:50:49.9069698Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:49.9070361Z At global scope: 2025-05-07T19:50:49.9071669Z cc1plus: note: unrecognized command-line option '-Wno-deprecated-anon-enum-enum-conversion' may have been intended to silence earlier diagnostics 2025-05-07T19:50:49.9082866Z [39/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/formatter.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/formatter.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/formatter.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/formatter.cpp 2025-05-07T19:50:49.9103804Z [40/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/jitallocator.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/jitallocator.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/jitallocator.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/jitallocator.cpp 2025-05-07T19:50:49.9246766Z [41/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/virtmem.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/virtmem.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/virtmem.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/virtmem.cpp 2025-05-07T19:50:49.9322317Z [42/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/compiler.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/compiler.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/compiler.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/compiler.cpp 2025-05-07T19:50:49.9693782Z [43/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/builder.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/builder.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/builder.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/builder.cpp 2025-05-07T19:50:49.9732283Z [44/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/cpuinfo.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/cpuinfo.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/cpuinfo.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/cpuinfo.cpp 2025-05-07T19:50:49.9817665Z [45/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64emithelper.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64emithelper.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64emithelper.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64emithelper.cpp 2025-05-07T19:50:49.9828978Z In file included from /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/a64emitter.h:12, 2025-05-07T19:50:49.9830430Z from /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64emithelper_p.h:13, 2025-05-07T19:50:49.9831695Z from /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64emithelper.cpp:14: 2025-05-07T19:50:49.9833584Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecB8() const': 2025-05-07T19:50:49.9837099Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h:132:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:49.9841415Z 132 | ASMJIT_INLINE_NODEBUG constexpr bool isVecB8() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementB); } 2025-05-07T19:50:49.9843547Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:49.9845254Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecH4() const': 2025-05-07T19:50:49.9848914Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h:133:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:49.9853051Z 133 | ASMJIT_INLINE_NODEBUG constexpr bool isVecH4() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementH); } 2025-05-07T19:50:49.9855170Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:49.9856875Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecS2() const': 2025-05-07T19:50:49.9860624Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h:134:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:49.9864906Z 134 | ASMJIT_INLINE_NODEBUG constexpr bool isVecS2() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementS); } 2025-05-07T19:50:49.9867027Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:49.9868762Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecD1() const': 2025-05-07T19:50:49.9872414Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h:135:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:49.9876354Z 135 | ASMJIT_INLINE_NODEBUG constexpr bool isVecD1() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature); } 2025-05-07T19:50:49.9878331Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:49.9880049Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecB16() const': 2025-05-07T19:50:49.9883708Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h:137:112: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:49.9887853Z 137 | ASMJIT_INLINE_NODEBUG constexpr bool isVecB16() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementB); } 2025-05-07T19:50:49.9890144Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:49.9891863Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecH8() const': 2025-05-07T19:50:49.9895548Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h:138:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:49.9899998Z 138 | ASMJIT_INLINE_NODEBUG constexpr bool isVecH8() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementH); } 2025-05-07T19:50:49.9902323Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:49.9903998Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecS4() const': 2025-05-07T19:50:49.9907571Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h:139:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:49.9911607Z 139 | ASMJIT_INLINE_NODEBUG constexpr bool isVecS4() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementS); } 2025-05-07T19:50:49.9913898Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:49.9915578Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecD2() const': 2025-05-07T19:50:49.9919175Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h:140:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:49.9923184Z 140 | ASMJIT_INLINE_NODEBUG constexpr bool isVecD2() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementD); } 2025-05-07T19:50:49.9925220Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:49.9926878Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecB4x4() const': 2025-05-07T19:50:49.9930506Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h:141:113: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:49.9934612Z 141 | ASMJIT_INLINE_NODEBUG constexpr bool isVecB4x4() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementB4); } 2025-05-07T19:50:49.9936744Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:49.9940444Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecH2x4() const': 2025-05-07T19:50:49.9944175Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h:142:113: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:49.9948275Z 142 | ASMJIT_INLINE_NODEBUG constexpr bool isVecH2x4() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementH2); } 2025-05-07T19:50:49.9950434Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:49.9951071Z At global scope: 2025-05-07T19:50:49.9952389Z cc1plus: note: unrecognized command-line option '-Wno-deprecated-anon-enum-enum-conversion' may have been intended to silence earlier diagnostics 2025-05-07T19:50:50.0329520Z [46/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86formatter.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86formatter.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86formatter.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86formatter.cpp 2025-05-07T19:50:50.1404904Z [47/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86instdb.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86instdb.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86instdb.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86instdb.cpp 2025-05-07T19:50:50.1643632Z [48/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/ralocal.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/ralocal.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/ralocal.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/ralocal.cpp 2025-05-07T19:50:50.3060819Z [49/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86instapi.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86instapi.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86instapi.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86instapi.cpp 2025-05-07T19:50:50.3994176Z [50/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64rapass.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64rapass.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64rapass.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64rapass.cpp 2025-05-07T19:50:50.4005337Z In file included from /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/a64emitter.h:12, 2025-05-07T19:50:50.4006665Z from /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64assembler.h:10, 2025-05-07T19:50:50.4007793Z from /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64rapass.cpp:12: 2025-05-07T19:50:50.4009480Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecB8() const': 2025-05-07T19:50:50.4012743Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h:132:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:50.4016509Z 132 | ASMJIT_INLINE_NODEBUG constexpr bool isVecB8() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementB); } 2025-05-07T19:50:50.4018447Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:50.4020345Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecH4() const': 2025-05-07T19:50:50.4023687Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h:133:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:50.4027401Z 133 | ASMJIT_INLINE_NODEBUG constexpr bool isVecH4() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementH); } 2025-05-07T19:50:50.4029394Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:50.4031001Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecS2() const': 2025-05-07T19:50:50.4034373Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h:134:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:50.4038081Z 134 | ASMJIT_INLINE_NODEBUG constexpr bool isVecS2() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementS); } 2025-05-07T19:50:50.4040252Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:50.4041861Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecD1() const': 2025-05-07T19:50:50.4045190Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h:135:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:50.4048816Z 135 | ASMJIT_INLINE_NODEBUG constexpr bool isVecD1() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature); } 2025-05-07T19:50:50.4050633Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:50.4052248Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecB16() const': 2025-05-07T19:50:50.4055529Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h:137:112: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:50.4059426Z 137 | ASMJIT_INLINE_NODEBUG constexpr bool isVecB16() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementB); } 2025-05-07T19:50:50.4061457Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:50.4062925Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecH8() const': 2025-05-07T19:50:50.4066379Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h:138:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:50.4070236Z 138 | ASMJIT_INLINE_NODEBUG constexpr bool isVecH8() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementH); } 2025-05-07T19:50:50.4072178Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:50.4073682Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecS4() const': 2025-05-07T19:50:50.4077049Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h:139:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:50.4080627Z 139 | ASMJIT_INLINE_NODEBUG constexpr bool isVecS4() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementS); } 2025-05-07T19:50:50.4082516Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:50.4084073Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecD2() const': 2025-05-07T19:50:50.4087565Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h:140:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:50.4091385Z 140 | ASMJIT_INLINE_NODEBUG constexpr bool isVecD2() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementD); } 2025-05-07T19:50:50.4093351Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:50.4094934Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecB4x4() const': 2025-05-07T19:50:50.4098025Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h:141:113: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:50.4101839Z 141 | ASMJIT_INLINE_NODEBUG constexpr bool isVecB4x4() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementB4); } 2025-05-07T19:50:50.4103622Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:50.4105082Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecH2x4() const': 2025-05-07T19:50:50.4108156Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h:142:113: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:50.4111774Z 142 | ASMJIT_INLINE_NODEBUG constexpr bool isVecH2x4() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementH2); } 2025-05-07T19:50:50.4113684Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:50.4114278Z At global scope: 2025-05-07T19:50:50.4115530Z cc1plus: note: unrecognized command-line option '-Wno-deprecated-anon-enum-enum-conversion' may have been intended to silence earlier diagnostics 2025-05-07T19:50:50.4180549Z [51/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/codeholder.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/codeholder.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/codeholder.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/codeholder.cpp 2025-05-07T19:50:50.5042577Z [52/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -mavx2 -mf16c -mfma -fopenmp -MD -MT CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/EmbeddingSpMDMAvx2.cc.o -MF CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/EmbeddingSpMDMAvx2.cc.o.d -o CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/EmbeddingSpMDMAvx2.cc.o -c /__w/FBGEMM/FBGEMM/src/EmbeddingSpMDMAvx2.cc 2025-05-07T19:50:50.6965197Z [53/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/rapass.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/rapass.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/rapass.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/rapass.cpp 2025-05-07T19:50:50.7203111Z [54/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -mavx2 -mf16c -mfma -mavx512f -mavx512bw -mavx512dq -mavx512vl -fopenmp -MD -MT CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/EmbeddingSpMDMAvx512.cc.o -MF CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/EmbeddingSpMDMAvx512.cc.o.d -o CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/EmbeddingSpMDMAvx512.cc.o -c /__w/FBGEMM/FBGEMM/src/EmbeddingSpMDMAvx512.cc 2025-05-07T19:50:50.8761260Z [55/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86func.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86func.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86func.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86func.cpp 2025-05-07T19:50:50.9300765Z [56/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86emithelper.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86emithelper.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86emithelper.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86emithelper.cpp 2025-05-07T19:50:50.9386224Z [57/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86builder.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86builder.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86builder.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86builder.cpp 2025-05-07T19:50:51.0440232Z [58/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/QuantUtils.cc.o -MF CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/QuantUtils.cc.o.d -o CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/QuantUtils.cc.o -c /__w/FBGEMM/FBGEMM/src/QuantUtils.cc 2025-05-07T19:50:51.1660607Z [59/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86compiler.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86compiler.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86compiler.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86compiler.cpp 2025-05-07T19:50:51.5518495Z [60/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64assembler.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64assembler.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64assembler.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64assembler.cpp 2025-05-07T19:50:51.5529258Z In file included from /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/a64emitter.h:12, 2025-05-07T19:50:51.5530529Z from /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64assembler.h:10, 2025-05-07T19:50:51.5531702Z from /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64assembler.cpp:18: 2025-05-07T19:50:51.5533453Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecB8() const': 2025-05-07T19:50:51.5537058Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h:132:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:51.5540930Z 132 | ASMJIT_INLINE_NODEBUG constexpr bool isVecB8() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementB); } 2025-05-07T19:50:51.5542819Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:51.5544306Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecH4() const': 2025-05-07T19:50:51.5547643Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h:133:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:51.5552048Z 133 | ASMJIT_INLINE_NODEBUG constexpr bool isVecH4() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementH); } 2025-05-07T19:50:51.5554154Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:51.5555821Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecS2() const': 2025-05-07T19:50:51.5559342Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h:134:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:51.5563225Z 134 | ASMJIT_INLINE_NODEBUG constexpr bool isVecS2() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementS); } 2025-05-07T19:50:51.5565314Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:51.5567020Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecD1() const': 2025-05-07T19:50:51.5570707Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h:135:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:51.5574781Z 135 | ASMJIT_INLINE_NODEBUG constexpr bool isVecD1() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature); } 2025-05-07T19:50:51.5576739Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:51.5578469Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecB16() const': 2025-05-07T19:50:51.5582333Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h:137:112: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:51.5586432Z 137 | ASMJIT_INLINE_NODEBUG constexpr bool isVecB16() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementB); } 2025-05-07T19:50:51.5588562Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:51.5590258Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecH8() const': 2025-05-07T19:50:51.5593606Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h:138:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:51.5597479Z 138 | ASMJIT_INLINE_NODEBUG constexpr bool isVecH8() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementH); } 2025-05-07T19:50:51.5599716Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:51.5601509Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecS4() const': 2025-05-07T19:50:51.5605116Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h:139:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:51.5609226Z 139 | ASMJIT_INLINE_NODEBUG constexpr bool isVecS4() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementS); } 2025-05-07T19:50:51.5611241Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:51.5612793Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecD2() const': 2025-05-07T19:50:51.5616189Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h:140:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:51.5620352Z 140 | ASMJIT_INLINE_NODEBUG constexpr bool isVecD2() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementD); } 2025-05-07T19:50:51.5622700Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:51.5624326Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecB4x4() const': 2025-05-07T19:50:51.5627653Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h:141:113: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:51.5631440Z 141 | ASMJIT_INLINE_NODEBUG constexpr bool isVecB4x4() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementB4); } 2025-05-07T19:50:51.5633457Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:51.5635143Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecH2x4() const': 2025-05-07T19:50:51.5638694Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h:142:113: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:51.5642468Z 142 | ASMJIT_INLINE_NODEBUG constexpr bool isVecH2x4() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementH2); } 2025-05-07T19:50:51.5644464Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:51.5645111Z At global scope: 2025-05-07T19:50:51.5646627Z cc1plus: note: unrecognized command-line option '-Wno-deprecated-anon-enum-enum-conversion' may have been intended to silence earlier diagnostics 2025-05-07T19:50:51.8550716Z [61/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86rapass.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86rapass.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86rapass.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86rapass.cpp 2025-05-07T19:50:52.3232149Z [62/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/Utils.cc.o -MF CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/Utils.cc.o.d -o CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/Utils.cc.o -c /__w/FBGEMM/FBGEMM/src/Utils.cc 2025-05-07T19:50:52.8228606Z [63/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86assembler.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86assembler.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86assembler.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86assembler.cpp 2025-05-07T19:50:53.3913302Z [64/156] : && /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -fPIC -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -Wl,-O2 -Wl,--sort-common -Wl,--as-needed -Wl,-z,relro -Wl,-z,now -Wl,--disable-new-dtags -Wl,--gc-sections -Wl,--allow-shlib-undefined -Wl,-rpath,/github/home/miniconda/envs/build_binary/lib -Wl,-rpath-link,/github/home/miniconda/envs/build_binary/lib -L/github/home/miniconda/envs/build_binary/lib -L/github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib -L/github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/stubs -s -shared -Wl,-soname,asmjit.so -o asmjit.so CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64assembler.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64builder.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64compiler.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64emithelper.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64formatter.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64func.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64instapi.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64instdb.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64operand.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64rapass.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/armformatter.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/archtraits.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/assembler.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/builder.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/codeholder.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/codewriter.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/compiler.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/constpool.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/cpuinfo.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/emithelper.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/emitter.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/emitterutils.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/environment.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/errorhandler.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/formatter.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/func.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/funcargscontext.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/globals.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/inst.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/instdb.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/jitallocator.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/jitruntime.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/logger.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/operand.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/osutils.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/ralocal.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/rapass.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/rastack.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/string.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/support.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/target.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/type.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/virtmem.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zone.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zonehash.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zonelist.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zonestack.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zonetree.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zonevector.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86assembler.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86builder.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86compiler.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86emithelper.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86formatter.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86func.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86instapi.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86instdb.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86operand.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86rapass.cpp.o -L/lib/intel64 -L/lib/intel64_win -L/lib/win-x64 -Wl,-rpath,/lib/intel64:/lib/intel64_win:/lib/win-x64:/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/lib:/github/home/miniconda/envs/build_binary/lib/stubs: /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/lib/libtorch.so /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/lib/libc10.so /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/libnvrtc.so /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/lib/libc10_cuda.so /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib/libnccl.so.2 /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/stubs/libcuda.so /github/home/miniconda/envs/build_binary/lib/stubs/libnvidia-ml.so -Wl,--no-as-needed,"/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/lib/libtorch_cpu.so" -Wl,--as-needed -Wl,--no-as-needed,"/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/lib/libtorch_cuda.so" -Wl,--as-needed /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/lib/libc10_cuda.so /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/lib/libc10.so /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/libcudart.so -Wl,--no-as-needed,"/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/lib/libtorch.so" -Wl,--as-needed && : 2025-05-07T19:50:53.3974949Z [65/156] cd /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.13/cmake-build && bash /__w/FBGEMM/FBGEMM/fbgemm_gpu/../.github/scripts/fbgemm_gpu_postbuild.bash /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.13/cmake-build/asmjit.so 2025-05-07T19:50:53.3976899Z ################################################################################ 2025-05-07T19:50:53.3977492Z [CMAKE] Running post-build script ... 2025-05-07T19:50:53.3978313Z Target file: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.13/cmake-build/asmjit.so 2025-05-07T19:50:53.3979137Z Removing all RPATHs ... 2025-05-07T19:50:53.3979982Z ################################################################################ 2025-05-07T19:50:53.5271315Z [66/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/SparseAdagrad.cc.o -MF CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/SparseAdagrad.cc.o.d -o CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/SparseAdagrad.cc.o -c /__w/FBGEMM/FBGEMM/src/SparseAdagrad.cc 2025-05-07T19:50:55.8975586Z [67/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/RefImplementations.cc.o -MF CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/RefImplementations.cc.o.d -o CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/RefImplementations.cc.o -c /__w/FBGEMM/FBGEMM/src/RefImplementations.cc 2025-05-07T19:50:58.0670379Z [68/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/RowWiseSparseAdagradFused.cc.o -MF CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/RowWiseSparseAdagradFused.cc.o.d -o CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/RowWiseSparseAdagradFused.cc.o -c /__w/FBGEMM/FBGEMM/src/RowWiseSparseAdagradFused.cc 2025-05-07T19:50:59.0153649Z [69/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -mavx2 -mf16c -mfma -fopenmp -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/gather_scatter/gather_scatter.cpp.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/gather_scatter/gather_scatter.cpp.o.d -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/gather_scatter/gather_scatter.cpp.o -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/gather_scatter/gather_scatter.cpp 2025-05-07T19:50:59.2377249Z [70/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -mavx2 -mf16c -mfma -fopenmp -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/moe/index_shuffling.cpp.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/moe/index_shuffling.cpp.o.d -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/moe/index_shuffling.cpp.o -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/moe/index_shuffling.cpp 2025-05-07T19:50:59.4893787Z [71/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -mavx2 -mf16c -mfma -fopenmp -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/attention/attention.cpp.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/attention/attention.cpp.o.d -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/attention/attention.cpp.o -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/attention/attention.cpp 2025-05-07T19:50:59.5486405Z [72/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -mavx2 -mf16c -mfma -fopenmp -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/coalesce/coalesce.cpp.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/coalesce/coalesce.cpp.o.d -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/coalesce/coalesce.cpp.o -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/coalesce/coalesce.cpp 2025-05-07T19:50:59.7003472Z [73/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/EmbeddingSpMDMAutovec.cc.o -MF CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/EmbeddingSpMDMAutovec.cc.o.d -o CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/EmbeddingSpMDMAutovec.cc.o -c /__w/FBGEMM/FBGEMM/src/EmbeddingSpMDMAutovec.cc 2025-05-07T19:51:01.9078152Z [74/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -mavx2 -mf16c -mfma -fopenmp -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/kv_cache/kv_cache.cpp.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/kv_cache/kv_cache.cpp.o.d -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/kv_cache/kv_cache.cpp.o -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/kv_cache/kv_cache.cpp 2025-05-07T19:51:02.1996941Z [75/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -mavx2 -mf16c -mfma -fopenmp -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/quantize.cpp.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/quantize.cpp.o.d -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/quantize.cpp.o -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/quantize.cpp 2025-05-07T19:51:02.7442605Z [76/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -mavx2 -mf16c -mfma -fopenmp -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/comm/car.cpp.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/comm/car.cpp.o.d -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/comm/car.cpp.o -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/comm/car.cpp 2025-05-07T19:51:06.5585056Z [77/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -mavx2 -mf16c -mfma -fopenmp -MD -MT CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/QuantUtilsAvx2.cc.o -MF CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/QuantUtilsAvx2.cc.o.d -o CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/QuantUtilsAvx2.cc.o -c /__w/FBGEMM/FBGEMM/src/QuantUtilsAvx2.cc 2025-05-07T19:51:17.6845682Z [78/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/EmbeddingSpMDMNBit.cc.o -MF CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/EmbeddingSpMDMNBit.cc.o.d -o CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/EmbeddingSpMDMNBit.cc.o -c /__w/FBGEMM/FBGEMM/src/EmbeddingSpMDMNBit.cc 2025-05-07T19:52:00.0579229Z [79/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/EmbeddingSpMDM.cc.o -MF CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/EmbeddingSpMDM.cc.o.d -o CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/EmbeddingSpMDM.cc.o -c /__w/FBGEMM/FBGEMM/src/EmbeddingSpMDM.cc 2025-05-07T19:52:01.4356921Z [80/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16.cu.o 2025-05-07T19:52:01.4379419Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:52:01.4874019Z [81/156] : && /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -fPIC -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -Wl,-O2 -Wl,--sort-common -Wl,--as-needed -Wl,-z,relro -Wl,-z,now -Wl,--disable-new-dtags -Wl,--gc-sections -Wl,--allow-shlib-undefined -Wl,-rpath,/github/home/miniconda/envs/build_binary/lib -Wl,-rpath-link,/github/home/miniconda/envs/build_binary/lib -L/github/home/miniconda/envs/build_binary/lib -L/github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib -L/github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/stubs -s -shared -Wl,-soname,fbgemm.so -o fbgemm.so CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/EmbeddingSpMDM.cc.o CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/EmbeddingSpMDMAutovec.cc.o CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/EmbeddingSpMDMNBit.cc.o CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/QuantUtils.cc.o CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/RefImplementations.cc.o CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/RowWiseSparseAdagradFused.cc.o CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/SparseAdagrad.cc.o CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/Utils.cc.o CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/EmbeddingSpMDMAvx2.cc.o CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/QuantUtilsAvx2.cc.o CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/EmbeddingSpMDMAvx512.cc.o -L/lib/intel64 -L/lib/intel64_win -L/lib/win-x64 -Wl,-rpath,"\$ORIGIN" /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/lib/libc10.so /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/libnvrtc.so /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/lib/libc10_cuda.so /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib/libnccl.so.2 /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/stubs/libcuda.so asmjit.so /github/home/miniconda/envs/build_binary/lib/stubs/libnvidia-ml.so -Wl,--no-as-needed,"/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/lib/libtorch.so" -Wl,--as-needed /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/lib/libtorch.so -Wl,--no-as-needed,"/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/lib/libtorch_cpu.so" -Wl,--as-needed -Wl,--no-as-needed,"/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/lib/libtorch_cuda.so" -Wl,--as-needed /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/lib/libc10_cuda.so /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/lib/libc10.so /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/libcudart.so && : 2025-05-07T19:52:01.8996249Z [82/156] cd /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.13/cmake-build && bash /__w/FBGEMM/FBGEMM/fbgemm_gpu/../.github/scripts/fbgemm_gpu_postbuild.bash /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.13/cmake-build/fbgemm.so 1 2025-05-07T19:52:01.8998447Z ################################################################################ 2025-05-07T19:52:01.8999072Z [CMAKE] Running post-build script ... 2025-05-07T19:52:01.8999997Z Target file: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.13/cmake-build/fbgemm.so 2025-05-07T19:52:01.9001161Z Resetting RPATH to $ORIGIN ... 2025-05-07T19:52:01.9001847Z 0x000000000000000f (RPATH) Library rpath: [$ORIGIN] 2025-05-07T19:52:01.9002579Z ################################################################################ 2025-05-07T19:52:10.5753048Z [83/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/coalesce/coalesce.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/coalesce/coalesce.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/coalesce/coalesce.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/coalesce/coalesce.cu.o 2025-05-07T19:52:10.5774814Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:52:10.8903587Z [84/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise.cu.o 2025-05-07T19:52:10.8929546Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:52:11.4179948Z [85/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_cublas.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_cublas.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_cublas.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_cublas.cu.o 2025-05-07T19:52:11.4199570Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:52:22.4408190Z [86/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_lite.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_lite.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_lite.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_lite.cu.o 2025-05-07T19:52:22.4430435Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:52:24.2246119Z [87/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/attention/gqa_attn_splitk.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/attention/gqa_attn_splitk.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/attention/gqa_attn_splitk.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/attention/gqa_attn_splitk.cu.o 2025-05-07T19:52:24.2268796Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:52:34.1928350Z [88/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/gather_scatter/gather_scatter.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/gather_scatter/gather_scatter.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/gather_scatter/gather_scatter.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/gather_scatter/gather_scatter.cu.o 2025-05-07T19:52:34.1950032Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:52:34.1952714Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/gather_scatter/gather_scatter.cu(202): warning #177-D: function "fbgemm_gpu::::TorchDTypeTrait::dtype" was declared but never referenced 2025-05-07T19:52:34.1954357Z static auto dtype() { 2025-05-07T19:52:34.1954855Z ^ 2025-05-07T19:52:34.1955079Z 2025-05-07T19:52:34.1955513Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:52:34.1956208Z 2025-05-07T19:52:34.1957683Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/gather_scatter/gather_scatter.cu(195): warning #177-D: function "fbgemm_gpu::::TorchDTypeTrait::dtype" was declared but never referenced 2025-05-07T19:52:34.1959294Z static auto dtype() { 2025-05-07T19:52:34.1959652Z ^ 2025-05-07T19:52:34.1959823Z 2025-05-07T19:52:34.1961239Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/gather_scatter/gather_scatter.cu(188): warning #177-D: function "fbgemm_gpu::::TorchDTypeTrait::dtype" was declared but never referenced 2025-05-07T19:52:34.1963007Z static auto dtype() { 2025-05-07T19:52:34.1963412Z ^ 2025-05-07T19:52:34.1963615Z 2025-05-07T19:52:34.1964904Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/gather_scatter/gather_scatter.cu(202): warning #177-D: function "fbgemm_gpu::::TorchDTypeTrait::dtype" was declared but never referenced 2025-05-07T19:52:34.1966685Z static auto dtype() { 2025-05-07T19:52:34.1967122Z ^ 2025-05-07T19:52:34.1967354Z 2025-05-07T19:52:34.1967807Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:52:34.1968487Z 2025-05-07T19:52:34.1970109Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/gather_scatter/gather_scatter.cu(195): warning #177-D: function "fbgemm_gpu::::TorchDTypeTrait::dtype" was declared but never referenced 2025-05-07T19:52:34.1971746Z static auto dtype() { 2025-05-07T19:52:34.1972166Z ^ 2025-05-07T19:52:34.1972381Z 2025-05-07T19:52:34.1973850Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/gather_scatter/gather_scatter.cu(188): warning #177-D: function "fbgemm_gpu::::TorchDTypeTrait::dtype" was declared but never referenced 2025-05-07T19:52:34.1975600Z static auto dtype() { 2025-05-07T19:52:34.1975998Z ^ 2025-05-07T19:52:34.1976227Z 2025-05-07T19:52:34.1977512Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/gather_scatter/gather_scatter.cu(202): warning #177-D: function "fbgemm_gpu::::TorchDTypeTrait::dtype" was declared but never referenced 2025-05-07T19:52:34.1979169Z static auto dtype() { 2025-05-07T19:52:34.1979757Z ^ 2025-05-07T19:52:34.1979951Z 2025-05-07T19:52:34.1980364Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:52:34.1980968Z 2025-05-07T19:52:34.1982367Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/gather_scatter/gather_scatter.cu(195): warning #177-D: function "fbgemm_gpu::::TorchDTypeTrait::dtype" was declared but never referenced 2025-05-07T19:52:34.1984074Z static auto dtype() { 2025-05-07T19:52:34.1984493Z ^ 2025-05-07T19:52:34.1984724Z 2025-05-07T19:52:34.1986204Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/gather_scatter/gather_scatter.cu(188): warning #177-D: function "fbgemm_gpu::::TorchDTypeTrait::dtype" was declared but never referenced 2025-05-07T19:52:34.1988189Z static auto dtype() { 2025-05-07T19:52:34.1988616Z ^ 2025-05-07T19:52:34.1988837Z 2025-05-07T19:52:34.1990203Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/gather_scatter/gather_scatter.cu(202): warning #177-D: function "fbgemm_gpu::::TorchDTypeTrait::dtype" was declared but never referenced 2025-05-07T19:52:34.1991889Z static auto dtype() { 2025-05-07T19:52:34.1992299Z ^ 2025-05-07T19:52:34.1992526Z 2025-05-07T19:52:34.1992956Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:52:34.1993617Z 2025-05-07T19:52:34.1995109Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/gather_scatter/gather_scatter.cu(195): warning #177-D: function "fbgemm_gpu::::TorchDTypeTrait::dtype" was declared but never referenced 2025-05-07T19:52:34.1996797Z static auto dtype() { 2025-05-07T19:52:34.1997180Z ^ 2025-05-07T19:52:34.1997375Z 2025-05-07T19:52:34.1998842Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/gather_scatter/gather_scatter.cu(188): warning #177-D: function "fbgemm_gpu::::TorchDTypeTrait::dtype" was declared but never referenced 2025-05-07T19:52:34.2001170Z static auto dtype() { 2025-05-07T19:52:34.2001599Z ^ 2025-05-07T19:52:34.2001805Z 2025-05-07T19:52:34.2003202Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/gather_scatter/gather_scatter.cu(202): warning #177-D: function "fbgemm_gpu::::TorchDTypeTrait::dtype" was declared but never referenced 2025-05-07T19:52:34.2005039Z static auto dtype() { 2025-05-07T19:52:34.2005455Z ^ 2025-05-07T19:52:34.2005662Z 2025-05-07T19:52:34.2006059Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:52:34.2006681Z 2025-05-07T19:52:34.2008033Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/gather_scatter/gather_scatter.cu(195): warning #177-D: function "fbgemm_gpu::::TorchDTypeTrait::dtype" was declared but never referenced 2025-05-07T19:52:34.2009741Z static auto dtype() { 2025-05-07T19:52:34.2010178Z ^ 2025-05-07T19:52:34.2010392Z 2025-05-07T19:52:34.2012168Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/gather_scatter/gather_scatter.cu(188): warning #177-D: function "fbgemm_gpu::::TorchDTypeTrait::dtype" was declared but never referenced 2025-05-07T19:52:34.2014084Z static auto dtype() { 2025-05-07T19:52:34.2014480Z ^ 2025-05-07T19:52:34.2014700Z 2025-05-07T19:52:34.2016018Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/gather_scatter/gather_scatter.cu(202): warning #177-D: function "fbgemm_gpu::::TorchDTypeTrait::dtype" was declared but never referenced 2025-05-07T19:52:34.2017728Z static auto dtype() { 2025-05-07T19:52:34.2018188Z ^ 2025-05-07T19:52:34.2018410Z 2025-05-07T19:52:34.2018842Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:52:34.2019613Z 2025-05-07T19:52:34.2020963Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/gather_scatter/gather_scatter.cu(195): warning #177-D: function "fbgemm_gpu::::TorchDTypeTrait::dtype" was declared but never referenced 2025-05-07T19:52:34.2022620Z static auto dtype() { 2025-05-07T19:52:34.2023020Z ^ 2025-05-07T19:52:34.2023239Z 2025-05-07T19:52:34.2024604Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/gather_scatter/gather_scatter.cu(188): warning #177-D: function "fbgemm_gpu::::TorchDTypeTrait::dtype" was declared but never referenced 2025-05-07T19:52:34.2026398Z static auto dtype() { 2025-05-07T19:52:34.2026826Z ^ 2025-05-07T19:52:34.2027047Z 2025-05-07T19:56:29.2681598Z [89/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_blockwise.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_blockwise.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_blockwise.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_blockwise.cu.o 2025-05-07T19:56:29.2957619Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:56:29.2959294Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:56:29.2960550Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:56:29.2961034Z ^ 2025-05-07T19:56:29.2961261Z 2025-05-07T19:56:29.2961528Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:56:29.2961907Z 2025-05-07T19:56:29.2962807Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:56:29.2963999Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T19:56:29.2964495Z ^ 2025-05-07T19:56:29.2964678Z 2025-05-07T19:56:29.3835876Z [90/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16.cu.o 2025-05-07T19:56:29.3848556Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:56:29.3850170Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:56:29.3851386Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:56:29.3851895Z ^ 2025-05-07T19:56:29.3852083Z 2025-05-07T19:56:29.3852344Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:56:29.3852760Z 2025-05-07T19:56:29.3853599Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:56:29.3854796Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T19:56:29.3855303Z ^ 2025-05-07T19:56:29.3855523Z 2025-05-07T19:56:30.5648565Z [91/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/moe/index_shuffling.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/moe/index_shuffling.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/moe/index_shuffling.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/moe/index_shuffling.cu.o 2025-05-07T19:56:30.5660958Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:56:42.2208037Z [92/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/quantize.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/quantize.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/quantize.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/quantize.cu.o 2025-05-07T19:56:42.2225955Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:56:42.2228260Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/quantize.cu(147): warning #177-D: variable "fbgemm_gpu::CVT_FP4_ELTS_PER_THREAD" was declared but never referenced 2025-05-07T19:56:42.2229631Z constexpr int CVT_FP4_ELTS_PER_THREAD = 8; 2025-05-07T19:56:42.2230131Z ^ 2025-05-07T19:56:42.2230332Z 2025-05-07T19:56:42.2230730Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:56:42.2231319Z 2025-05-07T19:56:42.2232301Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/quantize.cu(148): warning #177-D: variable "fbgemm_gpu::CVT_FP4_SF_VEC_SIZE" was declared but never referenced 2025-05-07T19:56:42.2233652Z constexpr int CVT_FP4_SF_VEC_SIZE = 16; 2025-05-07T19:56:42.2234116Z ^ 2025-05-07T19:56:42.2234320Z 2025-05-07T19:56:42.2235334Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/quantize.cu(147): warning #177-D: variable "fbgemm_gpu::CVT_FP4_ELTS_PER_THREAD" was declared but never referenced 2025-05-07T19:56:42.2236686Z constexpr int CVT_FP4_ELTS_PER_THREAD = 8; 2025-05-07T19:56:42.2237202Z ^ 2025-05-07T19:56:42.2237397Z 2025-05-07T19:56:42.2237746Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:56:42.2238303Z 2025-05-07T19:56:42.2239487Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/quantize.cu(148): warning #177-D: variable "fbgemm_gpu::CVT_FP4_SF_VEC_SIZE" was declared but never referenced 2025-05-07T19:56:42.2240768Z constexpr int CVT_FP4_SF_VEC_SIZE = 16; 2025-05-07T19:56:42.2241160Z ^ 2025-05-07T19:56:42.2241359Z 2025-05-07T19:56:42.2242301Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/quantize.cu(147): warning #177-D: variable "fbgemm_gpu::CVT_FP4_ELTS_PER_THREAD" was declared but never referenced 2025-05-07T19:56:42.2243616Z constexpr int CVT_FP4_ELTS_PER_THREAD = 8; 2025-05-07T19:56:42.2244032Z ^ 2025-05-07T19:56:42.2244209Z 2025-05-07T19:56:42.2244572Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:56:42.2245109Z 2025-05-07T19:56:42.2246046Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/quantize.cu(148): warning #177-D: variable "fbgemm_gpu::CVT_FP4_SF_VEC_SIZE" was declared but never referenced 2025-05-07T19:56:42.2247260Z constexpr int CVT_FP4_SF_VEC_SIZE = 16; 2025-05-07T19:56:42.2247688Z ^ 2025-05-07T19:56:42.2247906Z 2025-05-07T19:56:42.2248841Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/quantize.cu(147): warning #177-D: variable "fbgemm_gpu::CVT_FP4_ELTS_PER_THREAD" was declared but never referenced 2025-05-07T19:56:42.2250164Z constexpr int CVT_FP4_ELTS_PER_THREAD = 8; 2025-05-07T19:56:42.2250617Z ^ 2025-05-07T19:56:42.2250847Z 2025-05-07T19:56:42.2251208Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:56:42.2251737Z 2025-05-07T19:56:42.2252673Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/quantize.cu(148): warning #177-D: variable "fbgemm_gpu::CVT_FP4_SF_VEC_SIZE" was declared but never referenced 2025-05-07T19:56:42.2254210Z constexpr int CVT_FP4_SF_VEC_SIZE = 16; 2025-05-07T19:56:42.2254693Z ^ 2025-05-07T19:56:42.2254878Z 2025-05-07T19:56:42.2255760Z ptxas warning : Value of threads per SM for entry _ZN10fbgemm_gpu15cvt_fp16_to_fp4I13__nv_bfloat16Lb0EEEviiPKT_PKfPjS7_ is out of range. .minnctapersm will be ignored 2025-05-07T19:56:42.2257689Z ptxas warning : Value of threads per SM for entry _ZN10fbgemm_gpu15cvt_fp16_to_fp4I13__nv_bfloat16Lb1EEEviiPKT_PKfPjS7_ is out of range. .minnctapersm will be ignored 2025-05-07T19:56:42.2259649Z ptxas warning : Value of threads per SM for entry _ZN10fbgemm_gpu15cvt_fp16_to_fp4I6__halfLb0EEEviiPKT_PKfPjS7_ is out of range. .minnctapersm will be ignored 2025-05-07T19:56:42.2261409Z ptxas warning : Value of threads per SM for entry _ZN10fbgemm_gpu15cvt_fp16_to_fp4I6__halfLb1EEEviiPKT_PKfPjS7_ is out of range. .minnctapersm will be ignored 2025-05-07T19:56:45.4166919Z [93/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/comm/car.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/comm/car.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/comm/car.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/comm/car.cu.o 2025-05-07T19:56:45.4292550Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:56:56.1212892Z [94/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/dispatch_fp8_rowwise_batched_kernel_on_tile_size.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/dispatch_fp8_rowwise_batched_kernel_on_tile_size.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/dispatch_fp8_rowwise_batched_kernel_on_tile_size.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/dispatch_fp8_rowwise_batched_kernel_on_tile_size.cu.o 2025-05-07T19:56:56.1257020Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:56:56.1259947Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:56:56.1262004Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:56:56.1262830Z ^ 2025-05-07T19:56:56.1263190Z 2025-05-07T19:56:56.1263651Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:56:56.1264321Z 2025-05-07T19:56:56.1265910Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:56:56.1268059Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T19:56:56.1268904Z ^ 2025-05-07T19:56:56.1269204Z 2025-05-07T19:56:57.6722785Z [95/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_128_128_1_1_1_f_f.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_128_128_1_1_1_f_f.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_128_128_1_1_1_f_f.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_128_128_1_1_1_f_f.cu.o 2025-05-07T19:56:57.6747093Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:56:57.6750014Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:56:57.6752204Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:56:57.6753025Z ^ 2025-05-07T19:56:57.6753397Z 2025-05-07T19:56:57.6753853Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:56:57.6754555Z 2025-05-07T19:56:57.6756084Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:56:57.6758231Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T19:56:57.6777086Z ^ 2025-05-07T19:56:57.6777500Z 2025-05-07T19:56:58.8030038Z [96/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/kv_cache/kv_cache.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/kv_cache/kv_cache.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/kv_cache/kv_cache.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/kv_cache/kv_cache.cu.o 2025-05-07T19:56:58.8052472Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:57:00.4146130Z [97/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_16_128_1_1_1_f_f.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_16_128_1_1_1_f_f.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_16_128_1_1_1_f_f.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_16_128_1_1_1_f_f.cu.o 2025-05-07T19:57:00.4170438Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:57:00.4173440Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:57:00.4175537Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:57:00.4176357Z ^ 2025-05-07T19:57:00.4176670Z 2025-05-07T19:57:00.4177133Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:57:00.4177785Z 2025-05-07T19:57:00.4179339Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:57:00.4181713Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T19:57:00.4182554Z ^ 2025-05-07T19:57:00.4182859Z 2025-05-07T19:57:01.0443686Z [98/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/dispatch_fp8_rowwise_batched_kernel_on_cluster_size_and_transpose.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/dispatch_fp8_rowwise_batched_kernel_on_cluster_size_and_transpose.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/dispatch_fp8_rowwise_batched_kernel_on_cluster_size_and_transpose.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/dispatch_fp8_rowwise_batched_kernel_on_cluster_size_and_transpose.cu.o 2025-05-07T19:57:01.0467294Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:57:01.0470117Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:57:01.0472154Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:57:01.0472919Z ^ 2025-05-07T19:57:01.0473215Z 2025-05-07T19:57:01.0473663Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:57:01.0474297Z 2025-05-07T19:57:01.0475770Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:57:01.0477864Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T19:57:01.0478650Z ^ 2025-05-07T19:57:01.0478931Z 2025-05-07T19:57:06.8590743Z [99/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/f8f8bf16_rowwise_batched.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/f8f8bf16_rowwise_batched.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/f8f8bf16_rowwise_batched.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/f8f8bf16_rowwise_batched.cu.o 2025-05-07T19:57:06.8614173Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:57:06.8617030Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:57:06.8619050Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:57:06.8619932Z ^ 2025-05-07T19:57:06.8620288Z 2025-05-07T19:57:06.8620697Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:57:06.8621338Z 2025-05-07T19:57:06.8622784Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:57:06.8624809Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T19:57:06.8625567Z ^ 2025-05-07T19:57:06.8625851Z 2025-05-07T19:57:17.7482988Z [100/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_128_128_128_2_1_1_t_f.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_128_128_128_2_1_1_t_f.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_128_128_128_2_1_1_t_f.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_128_128_128_2_1_1_t_f.cu.o 2025-05-07T19:57:17.7496018Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:57:17.7497632Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:57:17.7498853Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:57:17.7499308Z ^ 2025-05-07T19:57:17.7499621Z 2025-05-07T19:57:17.7499877Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:57:17.7500461Z 2025-05-07T19:57:17.7501327Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:57:17.7502510Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T19:57:17.7502978Z ^ 2025-05-07T19:57:17.7503150Z 2025-05-07T19:57:30.3335553Z [101/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_256_128_1_1_1_f_f.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_256_128_1_1_1_f_f.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_256_128_1_1_1_f_f.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_256_128_1_1_1_f_f.cu.o 2025-05-07T19:57:30.3348703Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:57:30.3350324Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:57:30.3351482Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:57:30.3351954Z ^ 2025-05-07T19:57:30.3352128Z 2025-05-07T19:57:30.3352378Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:57:30.3352760Z 2025-05-07T19:57:30.3353587Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:57:30.3354774Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T19:57:30.3355223Z ^ 2025-05-07T19:57:30.3355415Z 2025-05-07T19:57:31.1685617Z [102/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_example_py_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -mavx2 -mf16c -mfma -fopenmp -MD -MT experimental/example/CMakeFiles/fbgemm_gpu_experimental_example_py.dir/src/example_nccl.cpp.o -MF experimental/example/CMakeFiles/fbgemm_gpu_experimental_example_py.dir/src/example_nccl.cpp.o.d -o experimental/example/CMakeFiles/fbgemm_gpu_experimental_example_py.dir/src/example_nccl.cpp.o -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/example/src/example_nccl.cpp 2025-05-07T19:57:35.1031847Z [103/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/bf16i4bf16_rowwise_batched.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/bf16i4bf16_rowwise_batched.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/bf16i4bf16_rowwise_batched.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/bf16i4bf16_rowwise_batched.cu.o 2025-05-07T19:57:35.1053306Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:57:35.1055845Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:57:35.1057821Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:57:35.1058603Z ^ 2025-05-07T19:57:35.1058894Z 2025-05-07T19:57:35.1059288Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:57:35.1060041Z 2025-05-07T19:57:35.1061466Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:57:35.1063439Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T19:57:35.1064162Z ^ 2025-05-07T19:57:35.1064444Z 2025-05-07T19:57:37.2240395Z [104/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/fast_gemv/include/fast_gemv.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/fast_gemv/include/fast_gemv.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/fast_gemv/include/fast_gemv.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/fast_gemv/include/fast_gemv.cu.o 2025-05-07T19:57:37.2263351Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:57:41.5884903Z [105/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_example_py_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -mavx2 -mf16c -mfma -fopenmp -MD -MT experimental/example/CMakeFiles/fbgemm_gpu_experimental_example_py.dir/src/example_ops.cpp.o -MF experimental/example/CMakeFiles/fbgemm_gpu_experimental_example_py.dir/src/example_ops.cpp.o.d -o experimental/example/CMakeFiles/fbgemm_gpu_experimental_example_py.dir/src/example_ops.cpp.o -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/example/src/example_ops.cpp 2025-05-07T19:57:44.5350902Z [106/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_128_256_128_2_1_1_f_t.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_128_256_128_2_1_1_f_t.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_128_256_128_2_1_1_f_t.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_128_256_128_2_1_1_f_t.cu.o 2025-05-07T19:57:44.5374070Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:57:44.5376596Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:57:44.5378653Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:57:44.5408200Z ^ 2025-05-07T19:57:44.5408568Z 2025-05-07T19:57:44.5409072Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:57:44.5409751Z 2025-05-07T19:57:44.5411253Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:57:44.5473662Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T19:57:44.5474476Z ^ 2025-05-07T19:57:44.5474757Z 2025-05-07T19:57:44.5486449Z ptxas info : (C7511) Potential Performance Loss: wgmma.mma_async instructions are serialized due to insufficient register resources for the wgmma pipeline in the function '_ZN7cutlass13device_kernelINS_4gemm6kernel13GemmUniversalIN4cute5tupleIJiiiEEENS1_10collective13CollectiveMmaINS1_37MainloopSm90TmaGmmaWarpSpecializedFP8ILi4ENS5_IJNS4_1CILi2EEENSA_ILi1EEESC_EEENS1_24KernelTmaWarpSpecializedEEENS5_IJNSA_ILi128EEENSA_ILi256EEESG_EEENS_12float_e4m3_tENS5_IJlSC_lEEESJ_SK_NS4_8TiledMMAINS4_8MMA_AtomIJNS4_4SM904GMMA31MMA_64x256x32_F32E4M3E4M3_SS_TNILNSO_7ScaleInE1ELSQ_1EEEEEENS4_6LayoutINS5_IJSC_SC_SC_EEENS5_IJNSA_ILi0EEESV_SV_EEEEENS5_IJNS4_10UnderscoreESY_SY_EEEEENS4_13SM90_TMA_LOADENS4_14ComposedLayoutINS4_7SwizzleILi3ELi4ELi3EEENS4_18smem_ptr_flag_bitsILi8EEENST_INS5_IJNSA_ILi8EEESG_EEENS5_IJSG_SC_EEEEEEEvNS4_8identityENS4_23SM90_TMA_LOAD_MULTICASTES1B_vS1C_EENS_8epilogue10collective18CollectiveEpilogueINS1F_22Sm90TmaWarpSpecializedILi4ELi2ELi16ELb0ELb1EEEJSI_NS5_IJSG_NSA_ILi32EEEEEEvNS5_IJSC_llEEENS_10bfloat16_tES1M_NS1F_6fusion15Sm90TreeVisitorINS1O_11Sm90ComputeINS_10multipliesES1N_fLNS_15FloatRoundStyleE2EvEEJNS1O_16Sm90ColBroadcastILi0ESI_ffNS5_IJSC_SV_SV_EEELi4ELb1EEENS1P_INS1Q_IS1R_ffLS1S_2EvEEJNS1O_16Sm90RowBroadcastILi0ESI_ffNS5_IJSV_SC_SV_EEELi4ELb1EEENS1O_12Sm90AccFetchEEEEEEES11_NS12_IS14_NS15_ILi16EEENST_INS5_IJNSA_ILi64EEES17_EEENS5_IJSC_S25_EEEEEEENS4_17SM75_U16x8_LDSM_TENS4_14SM90_TMA_STOREES29_NS4_17SM90_U16x8_STSM_TENS4_9Copy_AtomIJNS4_17SM90_U32x4_STSM_NENS_6half_tEEEEvEEEvvEEEEvNT_6ParamsE' 2025-05-07T19:57:44.5510904Z ptxas info : (C7511) Potential Performance Loss: wgmma.mma_async instructions are serialized due to insufficient register resources for the wgmma pipeline in the function '_ZN7cutlass13device_kernelINS_4gemm6kernel13GemmUniversalIN4cute5tupleIJiiiEEENS1_10collective13CollectiveMmaINS1_37MainloopSm90TmaGmmaWarpSpecializedFP8ILi4ENS5_IJNS4_1CILi2EEENSA_ILi1EEESC_EEENS1_24KernelTmaWarpSpecializedEEENS5_IJNSA_ILi128EEENSA_ILi256EEESG_EEENS_12float_e4m3_tENS5_IJlSC_lEEENS_12float_e5m2_tESK_NS4_8TiledMMAINS4_8MMA_AtomIJNS4_4SM904GMMA31MMA_64x256x32_F32E4M3E5M2_SS_TNILNSP_7ScaleInE1ELSR_1EEEEEENS4_6LayoutINS5_IJSC_SC_SC_EEENS5_IJNSA_ILi0EEESW_SW_EEEEENS5_IJNS4_10UnderscoreESZ_SZ_EEEEENS4_13SM90_TMA_LOADENS4_14ComposedLayoutINS4_7SwizzleILi3ELi4ELi3EEENS4_18smem_ptr_flag_bitsILi8EEENSU_INS5_IJNSA_ILi8EEESG_EEENS5_IJSG_SC_EEEEEEEvNS4_8identityENS4_23SM90_TMA_LOAD_MULTICASTES1C_vS1D_EENS_8epilogue10collective18CollectiveEpilogueINS1G_22Sm90TmaWarpSpecializedILi4ELi2ELi16ELb0ELb1EEEJSI_NS5_IJSG_NSA_ILi32EEEEEEvNS5_IJSC_llEEENS_10bfloat16_tES1N_NS1G_6fusion15Sm90TreeVisitorINS1P_11Sm90ComputeINS_10multipliesES1O_fLNS_15FloatRoundStyleE2EvEEJNS1P_16Sm90ColBroadcastILi0ESI_ffNS5_IJSC_SW_SW_EEELi4ELb1EEENS1Q_INS1R_IS1S_ffLS1T_2EvEEJNS1P_16Sm90RowBroadcastILi0ESI_ffNS5_IJSW_SC_SW_EEELi4ELb1EEENS1P_12Sm90AccFetchEEEEEEES12_NS13_IS15_NS16_ILi16EEENSU_INS5_IJNSA_ILi64EEES18_EEENS5_IJSC_S26_EEEEEEENS4_17SM75_U16x8_LDSM_TENS4_14SM90_TMA_STOREES2A_NS4_17SM90_U16x8_STSM_TENS4_9Copy_AtomIJNS4_17SM90_U32x4_STSM_NENS_6half_tEEEEvEEEvvEEEEvNT_6ParamsE' 2025-05-07T19:57:44.5661274Z ptxas info : (C7511) Potential Performance Loss: wgmma.mma_async instructions are serialized due to insufficient register resources for the wgmma pipeline in the function '_ZN7cutlass13device_kernelINS_4gemm6kernel13GemmUniversalIN4cute5tupleIJiiiEEENS1_10collective13CollectiveMmaINS1_37MainloopSm90TmaGmmaWarpSpecializedFP8ILi4ENS5_IJNS4_1CILi2EEENSA_ILi1EEESC_EEENS1_24KernelTmaWarpSpecializedEEENS5_IJNSA_ILi128EEENSA_ILi256EEESG_EEENS_12float_e4m3_tENS5_IJlSC_lEEESJ_SK_NS4_8TiledMMAINS4_8MMA_AtomIJNS4_4SM904GMMA31MMA_64x256x32_F32E4M3E4M3_SS_TNILNSO_7ScaleInE1ELSQ_1EEEEEENS4_6LayoutINS5_IJSC_SC_SC_EEENS5_IJNSA_ILi0EEESV_SV_EEEEENS5_IJNS4_10UnderscoreESY_SY_EEEEENS4_13SM90_TMA_LOADENS4_14ComposedLayoutINS4_7SwizzleILi3ELi4ELi3EEENS4_18smem_ptr_flag_bitsILi8EEENST_INS5_IJNSA_ILi8EEESG_EEENS5_IJSG_SC_EEEEEEEvNS4_8identityENS4_23SM90_TMA_LOAD_MULTICASTES1B_vS1C_EENS_8epilogue10collective18CollectiveEpilogueINS1F_22Sm90TmaWarpSpecializedILi4ELi2ELi16ELb0ELb1EEEJSI_NS5_IJSG_NSA_ILi32EEEEEEvNS5_IJSC_llEEENS_10bfloat16_tES1M_NS1F_6fusion15Sm90TreeVisitorINS1O_11Sm90ComputeINS_4plusES1N_fLNS_15FloatRoundStyleE2EvEEJNS1O_16Sm90ColBroadcastILi0ESI_ffNS5_IJSC_SV_SV_EEELi4ELb1EEENS1P_INS1Q_INS_10multipliesEffLS1S_2EvEEJS1W_NS1P_IS1Y_JNS1O_16Sm90RowBroadcastILi0ESI_ffNS5_IJSV_SC_SV_EEELi4ELb1EEENS1O_12Sm90AccFetchEEEEEEEEEES11_NS12_IS14_NS15_ILi16EEENST_INS5_IJNSA_ILi64EEES17_EEENS5_IJSC_S27_EEEEEEENS4_17SM75_U16x8_LDSM_TENS4_14SM90_TMA_STOREES2B_NS4_17SM90_U16x8_STSM_TENS4_9Copy_AtomIJNS4_17SM90_U32x4_STSM_NENS_6half_tEEEEvEEEvvEEEEvNT_6ParamsE' 2025-05-07T19:57:44.5685970Z ptxas info : (C7511) Potential Performance Loss: wgmma.mma_async instructions are serialized due to insufficient register resources for the wgmma pipeline in the function '_ZN7cutlass13device_kernelINS_4gemm6kernel13GemmUniversalIN4cute5tupleIJiiiEEENS1_10collective13CollectiveMmaINS1_37MainloopSm90TmaGmmaWarpSpecializedFP8ILi4ENS5_IJNS4_1CILi2EEENSA_ILi1EEESC_EEENS1_24KernelTmaWarpSpecializedEEENS5_IJNSA_ILi128EEENSA_ILi256EEESG_EEENS_12float_e4m3_tENS5_IJlSC_lEEENS_12float_e5m2_tESK_NS4_8TiledMMAINS4_8MMA_AtomIJNS4_4SM904GMMA31MMA_64x256x32_F32E4M3E5M2_SS_TNILNSP_7ScaleInE1ELSR_1EEEEEENS4_6LayoutINS5_IJSC_SC_SC_EEENS5_IJNSA_ILi0EEESW_SW_EEEEENS5_IJNS4_10UnderscoreESZ_SZ_EEEEENS4_13SM90_TMA_LOADENS4_14ComposedLayoutINS4_7SwizzleILi3ELi4ELi3EEENS4_18smem_ptr_flag_bitsILi8EEENSU_INS5_IJNSA_ILi8EEESG_EEENS5_IJSG_SC_EEEEEEEvNS4_8identityENS4_23SM90_TMA_LOAD_MULTICASTES1C_vS1D_EENS_8epilogue10collective18CollectiveEpilogueINS1G_22Sm90TmaWarpSpecializedILi4ELi2ELi16ELb0ELb1EEEJSI_NS5_IJSG_NSA_ILi32EEEEEEvNS5_IJSC_llEEENS_10bfloat16_tES1N_NS1G_6fusion15Sm90TreeVisitorINS1P_11Sm90ComputeINS_4plusES1O_fLNS_15FloatRoundStyleE2EvEEJNS1P_16Sm90ColBroadcastILi0ESI_ffNS5_IJSC_SW_SW_EEELi4ELb1EEENS1Q_INS1R_INS_10multipliesEffLS1T_2EvEEJS1X_NS1Q_IS1Z_JNS1P_16Sm90RowBroadcastILi0ESI_ffNS5_IJSW_SC_SW_EEELi4ELb1EEENS1P_12Sm90AccFetchEEEEEEEEEES12_NS13_IS15_NS16_ILi16EEENSU_INS5_IJNSA_ILi64EEES18_EEENS5_IJSC_S28_EEEEEEENS4_17SM75_U16x8_LDSM_TENS4_14SM90_TMA_STOREES2C_NS4_17SM90_U16x8_STSM_TENS4_9Copy_AtomIJNS4_17SM90_U32x4_STSM_NENS_6half_tEEEEvEEEvvEEEEvNT_6ParamsE' 2025-05-07T19:57:44.5710387Z ptxas info : (C7511) Potential Performance Loss: wgmma.mma_async instructions are serialized due to insufficient register resources for the wgmma pipeline in the function '_ZN7cutlass13device_kernelINS_4gemm6kernel13GemmUniversalIN4cute5tupleIJiiiEEENS1_10collective13CollectiveMmaINS1_37MainloopSm90TmaGmmaWarpSpecializedFP8ILi4ENS5_IJNS4_1CILi2EEENSA_ILi1EEESC_EEENS1_24KernelTmaWarpSpecializedEEENS5_IJNSA_ILi128EEENSA_ILi256EEESG_EEENS_12float_e4m3_tENS5_IJlSC_lEEESJ_SK_NS4_8TiledMMAINS4_8MMA_AtomIJNS4_4SM904GMMA31MMA_64x256x32_F32E4M3E4M3_SS_TNILNSO_7ScaleInE1ELSQ_1EEEEEENS4_6LayoutINS5_IJSC_SC_SC_EEENS5_IJNSA_ILi0EEESV_SV_EEEEENS5_IJNS4_10UnderscoreESY_SY_EEEEENS4_13SM90_TMA_LOADENS4_14ComposedLayoutINS4_7SwizzleILi3ELi4ELi3EEENS4_18smem_ptr_flag_bitsILi8EEENST_INS5_IJNSA_ILi8EEESG_EEENS5_IJSG_SC_EEEEEEEvNS4_8identityENS4_23SM90_TMA_LOAD_MULTICASTES1B_vS1C_EENS_8epilogue10collective18CollectiveEpilogueINS1F_22Sm90TmaWarpSpecializedILi4ELi2ELi16ELb0ELb1EEEJSI_NS5_IJSG_NSA_ILi32EEEEEEvNS5_IJSC_llEEENS_10bfloat16_tES1M_NS1F_6fusion15Sm90TreeVisitorINS1O_11Sm90ComputeINS_4plusES1N_S1N_LNS_15FloatRoundStyleE2EvEEJNS1O_16Sm90ColBroadcastILi0ESI_S1N_S1N_NS5_IJSC_SV_SV_EEELi8ELb1EEENS1P_INS1Q_INS_10multipliesES1N_fLS1S_2EvEEJNS1U_ILi0ESI_ffS1V_Li4ELb1EEENS1P_INS1Q_IS1X_ffLS1S_2EvEEJNS1O_16Sm90RowBroadcastILi0ESI_ffNS5_IJSV_SC_SV_EEELi4ELb1EEENS1O_12Sm90AccFetchEEEEEEEEEES11_NS12_IS14_NS15_ILi16EEENST_INS5_IJNSA_ILi64EEES17_EEENS5_IJSC_S29_EEEEEEENS4_17SM75_U16x8_LDSM_TENS4_14SM90_TMA_STOREES2D_NS4_17SM90_U16x8_STSM_TENS4_9Copy_AtomIJNS4_17SM90_U32x4_STSM_NENS_6half_tEEEEvEEEvvEEEEvNT_6ParamsE' 2025-05-07T19:57:44.5756519Z ptxas info : (C7511) Potential Performance Loss: wgmma.mma_async instructions are serialized due to insufficient register resources for the wgmma pipeline in the function '_ZN7cutlass13device_kernelINS_4gemm6kernel13GemmUniversalIN4cute5tupleIJiiiEEENS1_10collective13CollectiveMmaINS1_37MainloopSm90TmaGmmaWarpSpecializedFP8ILi4ENS5_IJNS4_1CILi2EEENSA_ILi1EEESC_EEENS1_24KernelTmaWarpSpecializedEEENS5_IJNSA_ILi128EEENSA_ILi256EEESG_EEENS_12float_e4m3_tENS5_IJlSC_lEEENS_12float_e5m2_tESK_NS4_8TiledMMAINS4_8MMA_AtomIJNS4_4SM904GMMA31MMA_64x256x32_F32E4M3E5M2_SS_TNILNSP_7ScaleInE1ELSR_1EEEEEENS4_6LayoutINS5_IJSC_SC_SC_EEENS5_IJNSA_ILi0EEESW_SW_EEEEENS5_IJNS4_10UnderscoreESZ_SZ_EEEEENS4_13SM90_TMA_LOADENS4_14ComposedLayoutINS4_7SwizzleILi3ELi4ELi3EEENS4_18smem_ptr_flag_bitsILi8EEENSU_INS5_IJNSA_ILi8EEESG_EEENS5_IJSG_SC_EEEEEEEvNS4_8identityENS4_23SM90_TMA_LOAD_MULTICASTES1C_vS1D_EENS_8epilogue10collective18CollectiveEpilogueINS1G_22Sm90TmaWarpSpecializedILi4ELi2ELi16ELb0ELb1EEEJSI_NS5_IJSG_NSA_ILi32EEEEEEvNS5_IJSC_llEEENS_10bfloat16_tES1N_NS1G_6fusion15Sm90TreeVisitorINS1P_11Sm90ComputeINS_4plusES1O_S1O_LNS_15FloatRoundStyleE2EvEEJNS1P_16Sm90ColBroadcastILi0ESI_S1O_S1O_NS5_IJSC_SW_SW_EEELi8ELb1EEENS1Q_INS1R_INS_10multipliesES1O_fLS1T_2EvEEJNS1V_ILi0ESI_ffS1W_Li4ELb1EEENS1Q_INS1R_IS1Y_ffLS1T_2EvEEJNS1P_16Sm90RowBroadcastILi0ESI_ffNS5_IJSW_SC_SW_EEELi4ELb1EEENS1P_12Sm90AccFetchEEEEEEEEEES12_NS13_IS15_NS16_ILi16EEENSU_INS5_IJNSA_ILi64EEES18_EEENS5_IJSC_S2A_EEEEEEENS4_17SM75_U16x8_LDSM_TENS4_14SM90_TMA_STOREES2E_NS4_17SM90_U16x8_STSM_TENS4_9Copy_AtomIJNS4_17SM90_U32x4_STSM_NENS_6half_tEEEEvEEEvvEEEEvNT_6ParamsE' 2025-05-07T19:57:49.9252228Z [107/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/bf16bf16bf16_grouped.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/bf16bf16bf16_grouped.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/bf16bf16bf16_grouped.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/bf16bf16bf16_grouped.cu.o 2025-05-07T19:57:49.9274108Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:57:49.9276984Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:57:49.9279059Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:57:49.9279857Z ^ 2025-05-07T19:57:49.9318344Z 2025-05-07T19:57:49.9319028Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:57:49.9319723Z 2025-05-07T19:57:49.9321196Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:57:49.9440247Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T19:57:49.9441382Z ^ 2025-05-07T19:57:49.9441724Z 2025-05-07T19:57:54.1056331Z [108/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_128_256_128_4_4_1_f_t.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_128_256_128_4_4_1_f_t.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_128_256_128_4_4_1_f_t.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_128_256_128_4_4_1_f_t.cu.o 2025-05-07T19:57:54.1148131Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:57:54.1151099Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:57:54.1153206Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:57:54.1241869Z ^ 2025-05-07T19:57:54.1467004Z 2025-05-07T19:57:54.1581296Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:57:54.1645370Z 2025-05-07T19:57:54.1647314Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:57:54.1764304Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T19:57:54.1765537Z ^ 2025-05-07T19:57:54.1911113Z 2025-05-07T19:57:54.1923417Z ptxas info : (C7511) Potential Performance Loss: wgmma.mma_async instructions are serialized due to insufficient register resources for the wgmma pipeline in the function '_ZN7cutlass13device_kernelINS_4gemm6kernel13GemmUniversalIN4cute5tupleIJiiiEEENS1_10collective13CollectiveMmaINS1_37MainloopSm90TmaGmmaWarpSpecializedFP8ILi4ENS5_IJNS4_1CILi4EEESB_NSA_ILi1EEEEEENS1_24KernelTmaWarpSpecializedEEENS5_IJNSA_ILi128EEENSA_ILi256EEESG_EEENS_12float_e4m3_tENS5_IJlSC_lEEESJ_SK_NS4_8TiledMMAINS4_8MMA_AtomIJNS4_4SM904GMMA31MMA_64x256x32_F32E4M3E4M3_SS_TNILNSO_7ScaleInE1ELSQ_1EEEEEENS4_6LayoutINS5_IJSC_SC_SC_EEENS5_IJNSA_ILi0EEESV_SV_EEEEENS5_IJNS4_10UnderscoreESY_SY_EEEEENS4_23SM90_TMA_LOAD_MULTICASTENS4_14ComposedLayoutINS4_7SwizzleILi3ELi4ELi3EEENS4_18smem_ptr_flag_bitsILi8EEENST_INS5_IJNSA_ILi8EEESG_EEENS5_IJSG_SC_EEEEEEEvNS4_8identityES11_S1B_vS1C_EENS_8epilogue10collective18CollectiveEpilogueINS1E_22Sm90TmaWarpSpecializedILi4ELi2ELi16ELb0ELb1EEEJSI_NS5_IJSG_NSA_ILi32EEEEEEvNS5_IJSC_llEEENS_10bfloat16_tES1L_NS1E_6fusion15Sm90TreeVisitorINS1N_11Sm90ComputeINS_10multipliesES1M_fLNS_15FloatRoundStyleE2EvEEJNS1N_16Sm90ColBroadcastILi0ESI_ffNS5_IJSC_SV_SV_EEELi4ELb1EEENS1O_INS1P_IS1Q_ffLS1R_2EvEEJNS1N_16Sm90RowBroadcastILi0ESI_ffNS5_IJSV_SC_SV_EEELi4ELb1EEENS1N_12Sm90AccFetchEEEEEEENS4_13SM90_TMA_LOADENS12_IS14_NS15_ILi16EEENST_INS5_IJNSA_ILi64EEES17_EEENS5_IJSC_S25_EEEEEEENS4_17SM75_U16x8_LDSM_TENS4_14SM90_TMA_STOREES29_NS4_17SM90_U16x8_STSM_TENS4_9Copy_AtomIJNS4_17SM90_U32x4_STSM_NENS_6half_tEEEEvEEEvvEEEEvNT_6ParamsE' 2025-05-07T19:57:54.1947243Z ptxas info : (C7511) Potential Performance Loss: wgmma.mma_async instructions are serialized due to insufficient register resources for the wgmma pipeline in the function '_ZN7cutlass13device_kernelINS_4gemm6kernel13GemmUniversalIN4cute5tupleIJiiiEEENS1_10collective13CollectiveMmaINS1_37MainloopSm90TmaGmmaWarpSpecializedFP8ILi4ENS5_IJNS4_1CILi4EEESB_NSA_ILi1EEEEEENS1_24KernelTmaWarpSpecializedEEENS5_IJNSA_ILi128EEENSA_ILi256EEESG_EEENS_12float_e4m3_tENS5_IJlSC_lEEENS_12float_e5m2_tESK_NS4_8TiledMMAINS4_8MMA_AtomIJNS4_4SM904GMMA31MMA_64x256x32_F32E4M3E5M2_SS_TNILNSP_7ScaleInE1ELSR_1EEEEEENS4_6LayoutINS5_IJSC_SC_SC_EEENS5_IJNSA_ILi0EEESW_SW_EEEEENS5_IJNS4_10UnderscoreESZ_SZ_EEEEENS4_23SM90_TMA_LOAD_MULTICASTENS4_14ComposedLayoutINS4_7SwizzleILi3ELi4ELi3EEENS4_18smem_ptr_flag_bitsILi8EEENSU_INS5_IJNSA_ILi8EEESG_EEENS5_IJSG_SC_EEEEEEEvNS4_8identityES12_S1C_vS1D_EENS_8epilogue10collective18CollectiveEpilogueINS1F_22Sm90TmaWarpSpecializedILi4ELi2ELi16ELb0ELb1EEEJSI_NS5_IJSG_NSA_ILi32EEEEEEvNS5_IJSC_llEEENS_10bfloat16_tES1M_NS1F_6fusion15Sm90TreeVisitorINS1O_11Sm90ComputeINS_10multipliesES1N_fLNS_15FloatRoundStyleE2EvEEJNS1O_16Sm90ColBroadcastILi0ESI_ffNS5_IJSC_SW_SW_EEELi4ELb1EEENS1P_INS1Q_IS1R_ffLS1S_2EvEEJNS1O_16Sm90RowBroadcastILi0ESI_ffNS5_IJSW_SC_SW_EEELi4ELb1EEENS1O_12Sm90AccFetchEEEEEEENS4_13SM90_TMA_LOADENS13_IS15_NS16_ILi16EEENSU_INS5_IJNSA_ILi64EEES18_EEENS5_IJSC_S26_EEEEEEENS4_17SM75_U16x8_LDSM_TENS4_14SM90_TMA_STOREES2A_NS4_17SM90_U16x8_STSM_TENS4_9Copy_AtomIJNS4_17SM90_U32x4_STSM_NENS_6half_tEEEEvEEEvvEEEEvNT_6ParamsE' 2025-05-07T19:57:54.2202668Z ptxas info : (C7511) Potential Performance Loss: wgmma.mma_async instructions are serialized due to insufficient register resources for the wgmma pipeline in the function '_ZN7cutlass13device_kernelINS_4gemm6kernel13GemmUniversalIN4cute5tupleIJiiiEEENS1_10collective13CollectiveMmaINS1_37MainloopSm90TmaGmmaWarpSpecializedFP8ILi4ENS5_IJNS4_1CILi4EEESB_NSA_ILi1EEEEEENS1_24KernelTmaWarpSpecializedEEENS5_IJNSA_ILi128EEENSA_ILi256EEESG_EEENS_12float_e4m3_tENS5_IJlSC_lEEESJ_SK_NS4_8TiledMMAINS4_8MMA_AtomIJNS4_4SM904GMMA31MMA_64x256x32_F32E4M3E4M3_SS_TNILNSO_7ScaleInE1ELSQ_1EEEEEENS4_6LayoutINS5_IJSC_SC_SC_EEENS5_IJNSA_ILi0EEESV_SV_EEEEENS5_IJNS4_10UnderscoreESY_SY_EEEEENS4_23SM90_TMA_LOAD_MULTICASTENS4_14ComposedLayoutINS4_7SwizzleILi3ELi4ELi3EEENS4_18smem_ptr_flag_bitsILi8EEENST_INS5_IJNSA_ILi8EEESG_EEENS5_IJSG_SC_EEEEEEEvNS4_8identityES11_S1B_vS1C_EENS_8epilogue10collective18CollectiveEpilogueINS1E_22Sm90TmaWarpSpecializedILi4ELi2ELi16ELb0ELb1EEEJSI_NS5_IJSG_NSA_ILi32EEEEEEvNS5_IJSC_llEEENS_10bfloat16_tES1L_NS1E_6fusion15Sm90TreeVisitorINS1N_11Sm90ComputeINS_4plusES1M_fLNS_15FloatRoundStyleE2EvEEJNS1N_16Sm90ColBroadcastILi0ESI_ffNS5_IJSC_SV_SV_EEELi4ELb1EEENS1O_INS1P_INS_10multipliesEffLS1R_2EvEEJS1V_NS1O_IS1X_JNS1N_16Sm90RowBroadcastILi0ESI_ffNS5_IJSV_SC_SV_EEELi4ELb1EEENS1N_12Sm90AccFetchEEEEEEEEEENS4_13SM90_TMA_LOADENS12_IS14_NS15_ILi16EEENST_INS5_IJNSA_ILi64EEES17_EEENS5_IJSC_S27_EEEEEEENS4_17SM75_U16x8_LDSM_TENS4_14SM90_TMA_STOREES2B_NS4_17SM90_U16x8_STSM_TENS4_9Copy_AtomIJNS4_17SM90_U32x4_STSM_NENS_6half_tEEEEvEEEvvEEEEvNT_6ParamsE' 2025-05-07T19:57:54.2438182Z ptxas info : (C7511) Potential Performance Loss: wgmma.mma_async instructions are serialized due to insufficient register resources for the wgmma pipeline in the function '_ZN7cutlass13device_kernelINS_4gemm6kernel13GemmUniversalIN4cute5tupleIJiiiEEENS1_10collective13CollectiveMmaINS1_37MainloopSm90TmaGmmaWarpSpecializedFP8ILi4ENS5_IJNS4_1CILi4EEESB_NSA_ILi1EEEEEENS1_24KernelTmaWarpSpecializedEEENS5_IJNSA_ILi128EEENSA_ILi256EEESG_EEENS_12float_e4m3_tENS5_IJlSC_lEEENS_12float_e5m2_tESK_NS4_8TiledMMAINS4_8MMA_AtomIJNS4_4SM904GMMA31MMA_64x256x32_F32E4M3E5M2_SS_TNILNSP_7ScaleInE1ELSR_1EEEEEENS4_6LayoutINS5_IJSC_SC_SC_EEENS5_IJNSA_ILi0EEESW_SW_EEEEENS5_IJNS4_10UnderscoreESZ_SZ_EEEEENS4_23SM90_TMA_LOAD_MULTICASTENS4_14ComposedLayoutINS4_7SwizzleILi3ELi4ELi3EEENS4_18smem_ptr_flag_bitsILi8EEENSU_INS5_IJNSA_ILi8EEESG_EEENS5_IJSG_SC_EEEEEEEvNS4_8identityES12_S1C_vS1D_EENS_8epilogue10collective18CollectiveEpilogueINS1F_22Sm90TmaWarpSpecializedILi4ELi2ELi16ELb0ELb1EEEJSI_NS5_IJSG_NSA_ILi32EEEEEEvNS5_IJSC_llEEENS_10bfloat16_tES1M_NS1F_6fusion15Sm90TreeVisitorINS1O_11Sm90ComputeINS_4plusES1N_fLNS_15FloatRoundStyleE2EvEEJNS1O_16Sm90ColBroadcastILi0ESI_ffNS5_IJSC_SW_SW_EEELi4ELb1EEENS1P_INS1Q_INS_10multipliesEffLS1S_2EvEEJS1W_NS1P_IS1Y_JNS1O_16Sm90RowBroadcastILi0ESI_ffNS5_IJSW_SC_SW_EEELi4ELb1EEENS1O_12Sm90AccFetchEEEEEEEEEENS4_13SM90_TMA_LOADENS13_IS15_NS16_ILi16EEENSU_INS5_IJNSA_ILi64EEES18_EEENS5_IJSC_S28_EEEEEEENS4_17SM75_U16x8_LDSM_TENS4_14SM90_TMA_STOREES2C_NS4_17SM90_U16x8_STSM_TENS4_9Copy_AtomIJNS4_17SM90_U32x4_STSM_NENS_6half_tEEEEvEEEvvEEEEvNT_6ParamsE' 2025-05-07T19:57:54.2464311Z ptxas info : (C7511) Potential Performance Loss: wgmma.mma_async instructions are serialized due to insufficient register resources for the wgmma pipeline in the function '_ZN7cutlass13device_kernelINS_4gemm6kernel13GemmUniversalIN4cute5tupleIJiiiEEENS1_10collective13CollectiveMmaINS1_37MainloopSm90TmaGmmaWarpSpecializedFP8ILi4ENS5_IJNS4_1CILi4EEESB_NSA_ILi1EEEEEENS1_24KernelTmaWarpSpecializedEEENS5_IJNSA_ILi128EEENSA_ILi256EEESG_EEENS_12float_e4m3_tENS5_IJlSC_lEEESJ_SK_NS4_8TiledMMAINS4_8MMA_AtomIJNS4_4SM904GMMA31MMA_64x256x32_F32E4M3E4M3_SS_TNILNSO_7ScaleInE1ELSQ_1EEEEEENS4_6LayoutINS5_IJSC_SC_SC_EEENS5_IJNSA_ILi0EEESV_SV_EEEEENS5_IJNS4_10UnderscoreESY_SY_EEEEENS4_23SM90_TMA_LOAD_MULTICASTENS4_14ComposedLayoutINS4_7SwizzleILi3ELi4ELi3EEENS4_18smem_ptr_flag_bitsILi8EEENST_INS5_IJNSA_ILi8EEESG_EEENS5_IJSG_SC_EEEEEEEvNS4_8identityES11_S1B_vS1C_EENS_8epilogue10collective18CollectiveEpilogueINS1E_22Sm90TmaWarpSpecializedILi4ELi2ELi16ELb0ELb1EEEJSI_NS5_IJSG_NSA_ILi32EEEEEEvNS5_IJSC_llEEENS_10bfloat16_tES1L_NS1E_6fusion15Sm90TreeVisitorINS1N_11Sm90ComputeINS_4plusES1M_S1M_LNS_15FloatRoundStyleE2EvEEJNS1N_16Sm90ColBroadcastILi0ESI_S1M_S1M_NS5_IJSC_SV_SV_EEELi8ELb1EEENS1O_INS1P_INS_10multipliesES1M_fLS1R_2EvEEJNS1T_ILi0ESI_ffS1U_Li4ELb1EEENS1O_INS1P_IS1W_ffLS1R_2EvEEJNS1N_16Sm90RowBroadcastILi0ESI_ffNS5_IJSV_SC_SV_EEELi4ELb1EEENS1N_12Sm90AccFetchEEEEEEEEEENS4_13SM90_TMA_LOADENS12_IS14_NS15_ILi16EEENST_INS5_IJNSA_ILi64EEES17_EEENS5_IJSC_S29_EEEEEEENS4_17SM75_U16x8_LDSM_TENS4_14SM90_TMA_STOREES2D_NS4_17SM90_U16x8_STSM_TENS4_9Copy_AtomIJNS4_17SM90_U32x4_STSM_NENS_6half_tEEEEvEEEvvEEEEvNT_6ParamsE' 2025-05-07T19:57:54.2491014Z ptxas info : (C7511) Potential Performance Loss: wgmma.mma_async instructions are serialized due to insufficient register resources for the wgmma pipeline in the function '_ZN7cutlass13device_kernelINS_4gemm6kernel13GemmUniversalIN4cute5tupleIJiiiEEENS1_10collective13CollectiveMmaINS1_37MainloopSm90TmaGmmaWarpSpecializedFP8ILi4ENS5_IJNS4_1CILi4EEESB_NSA_ILi1EEEEEENS1_24KernelTmaWarpSpecializedEEENS5_IJNSA_ILi128EEENSA_ILi256EEESG_EEENS_12float_e4m3_tENS5_IJlSC_lEEENS_12float_e5m2_tESK_NS4_8TiledMMAINS4_8MMA_AtomIJNS4_4SM904GMMA31MMA_64x256x32_F32E4M3E5M2_SS_TNILNSP_7ScaleInE1ELSR_1EEEEEENS4_6LayoutINS5_IJSC_SC_SC_EEENS5_IJNSA_ILi0EEESW_SW_EEEEENS5_IJNS4_10UnderscoreESZ_SZ_EEEEENS4_23SM90_TMA_LOAD_MULTICASTENS4_14ComposedLayoutINS4_7SwizzleILi3ELi4ELi3EEENS4_18smem_ptr_flag_bitsILi8EEENSU_INS5_IJNSA_ILi8EEESG_EEENS5_IJSG_SC_EEEEEEEvNS4_8identityES12_S1C_vS1D_EENS_8epilogue10collective18CollectiveEpilogueINS1F_22Sm90TmaWarpSpecializedILi4ELi2ELi16ELb0ELb1EEEJSI_NS5_IJSG_NSA_ILi32EEEEEEvNS5_IJSC_llEEENS_10bfloat16_tES1M_NS1F_6fusion15Sm90TreeVisitorINS1O_11Sm90ComputeINS_4plusES1N_S1N_LNS_15FloatRoundStyleE2EvEEJNS1O_16Sm90ColBroadcastILi0ESI_S1N_S1N_NS5_IJSC_SW_SW_EEELi8ELb1EEENS1P_INS1Q_INS_10multipliesES1N_fLS1S_2EvEEJNS1U_ILi0ESI_ffS1V_Li4ELb1EEENS1P_INS1Q_IS1X_ffLS1S_2EvEEJNS1O_16Sm90RowBroadcastILi0ESI_ffNS5_IJSW_SC_SW_EEELi4ELb1EEENS1O_12Sm90AccFetchEEEEEEEEEENS4_13SM90_TMA_LOADENS13_IS15_NS16_ILi16EEENSU_INS5_IJNSA_ILi64EEES18_EEENS5_IJSC_S2A_EEEEEEENS4_17SM75_U16x8_LDSM_TENS4_14SM90_TMA_STOREES2E_NS4_17SM90_U16x8_STSM_TENS4_9Copy_AtomIJNS4_17SM90_U32x4_STSM_NENS_6half_tEEEEvEEEvvEEEEvNT_6ParamsE' 2025-05-07T19:57:55.2525873Z [109/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_64_128_2_1_1_f_f.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_64_128_2_1_1_f_f.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_64_128_2_1_1_f_f.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_64_128_2_1_1_f_f.cu.o 2025-05-07T19:57:55.2549039Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:57:55.2552260Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:57:55.2554354Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:57:55.2555128Z ^ 2025-05-07T19:57:55.2555434Z 2025-05-07T19:57:55.2555847Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:57:55.2556476Z 2025-05-07T19:57:55.2558023Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:57:55.2560267Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T19:57:55.2561103Z ^ 2025-05-07T19:57:55.2561417Z 2025-05-07T19:57:55.6840664Z [110/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_32_128_2_1_1_f_f.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_32_128_2_1_1_f_f.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_32_128_2_1_1_f_f.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_32_128_2_1_1_f_f.cu.o 2025-05-07T19:57:55.6863767Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:57:55.6867180Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:57:55.6874536Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:57:55.6919821Z ^ 2025-05-07T19:57:55.6920197Z 2025-05-07T19:57:55.6920630Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:57:55.6921243Z 2025-05-07T19:57:55.6922816Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:57:55.6924945Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T19:57:55.6925647Z ^ 2025-05-07T19:57:55.6925873Z 2025-05-07T19:58:14.0625544Z [111/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/handle_transposition.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/handle_transposition.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/handle_transposition.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/handle_transposition.cu.o 2025-05-07T19:58:14.0674987Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:58:14.0677322Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:58:14.0679385Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:58:14.0703647Z ^ 2025-05-07T19:58:14.0743048Z 2025-05-07T19:58:14.0771054Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:58:14.0771894Z 2025-05-07T19:58:14.0773439Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:58:14.0853931Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T19:58:14.0854659Z ^ 2025-05-07T19:58:14.0854900Z 2025-05-07T19:58:16.0150026Z [112/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_256_128_2_1_1_f_f.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_256_128_2_1_1_f_f.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_256_128_2_1_1_f_f.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_256_128_2_1_1_f_f.cu.o 2025-05-07T19:58:16.2103490Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:58:16.2106694Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:58:16.2108974Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:58:16.2109744Z ^ 2025-05-07T19:58:16.2110030Z 2025-05-07T19:58:16.2110449Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:58:16.2111079Z 2025-05-07T19:58:16.2112551Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:58:16.2114570Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T19:58:16.2115316Z ^ 2025-05-07T19:58:16.2115614Z 2025-05-07T19:58:16.7544094Z [113/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/fast_gemv/bf16fp8bf16_fast_gemv.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/fast_gemv/bf16fp8bf16_fast_gemv.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/fast_gemv/bf16fp8bf16_fast_gemv.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/fast_gemv/bf16fp8bf16_fast_gemv.cu.o 2025-05-07T19:58:16.7564312Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:58:16.8476282Z [114/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/fast_gemv/bf16_fast_gemv.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/fast_gemv/bf16_fast_gemv.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/fast_gemv/bf16_fast_gemv.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/fast_gemv/bf16_fast_gemv.cu.o 2025-05-07T19:58:16.8488460Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:58:41.7141769Z [115/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/fast_gemv/fp8fp8bf16_fast_gemv.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/fast_gemv/fp8fp8bf16_fast_gemv.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/fast_gemv/fp8fp8bf16_fast_gemv.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/fast_gemv/fp8fp8bf16_fast_gemv.cu.o 2025-05-07T19:58:41.7163518Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:58:48.8849256Z [116/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/i8i8bf16.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/i8i8bf16.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/i8i8bf16.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/i8i8bf16.cu.o 2025-05-07T19:58:48.8871365Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:58:48.8874444Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:58:48.8876442Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:58:48.8877186Z ^ 2025-05-07T19:58:48.8877463Z 2025-05-07T19:58:48.8877870Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:58:48.8878485Z 2025-05-07T19:58:48.8879960Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:58:48.8882038Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T19:58:48.8882826Z ^ 2025-05-07T19:58:48.8883113Z 2025-05-07T19:58:50.4816461Z [117/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/i8i8bf16_dynamic.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/i8i8bf16_dynamic.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/i8i8bf16_dynamic.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/i8i8bf16_dynamic.cu.o 2025-05-07T19:58:50.4829427Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:58:50.4831029Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:58:50.4832258Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:58:50.4832734Z ^ 2025-05-07T19:58:50.4834278Z 2025-05-07T19:58:50.4834559Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:58:50.4834928Z 2025-05-07T19:58:50.4835782Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:58:50.4836960Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T19:58:50.4837426Z ^ 2025-05-07T19:58:50.4837601Z 2025-05-07T19:59:03.3905161Z [118/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_t.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_t.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_t.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_t.cu.o 2025-05-07T19:59:03.3945769Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:59:03.3947498Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:59:03.3948784Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:59:03.3949281Z ^ 2025-05-07T19:59:03.3949501Z 2025-05-07T19:59:03.3949787Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:59:03.3950440Z 2025-05-07T19:59:03.3951278Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:59:03.3952469Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T19:59:03.3952935Z ^ 2025-05-07T19:59:03.3953110Z 2025-05-07T19:59:03.3953918Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:59:03.3955086Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:59:03.3955549Z ^ 2025-05-07T19:59:03.3955815Z detected during: 2025-05-07T19:59:03.3970915Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T19:59:03.3999616Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T19:59:03.4029028Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T19:59:03.4045366Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=128, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_t.cu 2025-05-07T19:59:03.4046543Z 2025-05-07T19:59:03.4046795Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:59:03.4047179Z 2025-05-07T19:59:03.4047992Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:59:03.4049141Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:59:03.4049561Z ^ 2025-05-07T19:59:03.4049817Z detected during: 2025-05-07T19:59:03.4063810Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=10, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple, cute::C<1>, cute::C<1>>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T19:59:03.4092508Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T19:59:03.4121298Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T19:59:03.4150208Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T19:59:03.4166548Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=128, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_t.cu 2025-05-07T19:59:03.4167732Z 2025-05-07T19:59:03.4168545Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:59:03.4169716Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:59:03.4170171Z ^ 2025-05-07T19:59:03.4170457Z detected during: 2025-05-07T19:59:03.4185527Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T19:59:03.4214057Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T19:59:03.4242932Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T19:59:03.4259306Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=128, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_t.cu 2025-05-07T19:59:03.4260559Z 2025-05-07T19:59:03.4260813Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:59:03.4261181Z 2025-05-07T19:59:03.4262018Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:59:03.4263216Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:59:03.4263653Z ^ 2025-05-07T19:59:03.4263886Z detected during: 2025-05-07T19:59:03.4277882Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=10, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple, cute::C<1>, cute::C<1>>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T19:59:03.4306983Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T19:59:03.4335431Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T19:59:03.4364304Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T19:59:03.4380689Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=128, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_t.cu 2025-05-07T19:59:03.4381925Z 2025-05-07T19:59:03.4382737Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:59:03.4383914Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:59:03.4384366Z ^ 2025-05-07T19:59:03.4384655Z detected during: 2025-05-07T19:59:03.4399596Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T19:59:03.4428386Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T19:59:03.4457227Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T19:59:03.4473807Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=128, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_t.cu 2025-05-07T19:59:03.4474974Z 2025-05-07T19:59:03.4475224Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:59:03.4475587Z 2025-05-07T19:59:03.4515500Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:59:03.4516958Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:59:03.4517397Z ^ 2025-05-07T19:59:03.4517640Z detected during: 2025-05-07T19:59:03.4531917Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=10, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple, cute::C<1>, cute::C<1>>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T19:59:03.4560842Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T19:59:03.4589299Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T19:59:03.4618461Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T19:59:03.4634921Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=128, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_t.cu 2025-05-07T19:59:03.4636078Z 2025-05-07T19:59:03.4637385Z ptxas /tmp/tmpxft_00007e70_00000000-9_f4f4bf16_128_128_4_1_1_t.compute_90.ptx, line 925; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T19:59:03.4695344Z ptxas /tmp/tmpxft_00007e70_00000000-9_f4f4bf16_128_128_4_1_1_t.compute_90.ptx, line 937; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T19:59:03.4812225Z ptxas /tmp/tmpxft_00007e70_00000000-9_f4f4bf16_128_128_4_1_1_t.compute_90.ptx, line 1076; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T19:59:03.4815974Z ptxas /tmp/tmpxft_00007e70_00000000-9_f4f4bf16_128_128_4_1_1_t.compute_90.ptx, line 1088; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T19:59:03.4818164Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:59:03.4819320Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:59:03.4819910Z ^ 2025-05-07T19:59:03.4877110Z detected during: 2025-05-07T19:59:03.5056210Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T19:59:03.5479833Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T19:59:03.5738064Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T19:59:03.5854566Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=128, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_t.cu 2025-05-07T19:59:03.5855971Z 2025-05-07T19:59:03.5856275Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:59:03.5892257Z 2025-05-07T19:59:03.5893263Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:59:03.5928311Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:59:03.5982040Z ^ 2025-05-07T19:59:03.5982549Z detected during: 2025-05-07T19:59:03.6224978Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=10, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple, cute::C<1>, cute::C<1>>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T19:59:03.6262267Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T19:59:03.6306588Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T19:59:03.6346337Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T19:59:03.6375903Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=128, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_t.cu 2025-05-07T19:59:03.6377075Z 2025-05-07T19:59:03.6377906Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:59:03.6379062Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:59:03.6384956Z ^ 2025-05-07T19:59:03.6389911Z detected during: 2025-05-07T19:59:03.6414646Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T19:59:03.6451230Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T19:59:03.6498940Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T19:59:03.6519972Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=128, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_t.cu 2025-05-07T19:59:03.6521134Z 2025-05-07T19:59:03.6521387Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:59:03.6526569Z 2025-05-07T19:59:03.6527401Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:59:03.6531858Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:59:03.6532274Z ^ 2025-05-07T19:59:03.6539341Z detected during: 2025-05-07T19:59:03.6553490Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=10, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple, cute::C<1>, cute::C<1>>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T19:59:03.6594977Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T19:59:03.6640205Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T19:59:03.6678086Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T19:59:03.6698873Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=128, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_t.cu 2025-05-07T19:59:03.6703873Z 2025-05-07T19:59:03.6704687Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:59:03.6705867Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:59:03.6706321Z ^ 2025-05-07T19:59:03.6710113Z detected during: 2025-05-07T19:59:03.6739029Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T19:59:03.6782631Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T19:59:03.6815660Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T19:59:03.6832050Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=128, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_t.cu 2025-05-07T19:59:03.6833228Z 2025-05-07T19:59:03.6837541Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:59:03.6837926Z 2025-05-07T19:59:03.6838760Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:59:03.6843642Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:59:03.6847408Z ^ 2025-05-07T19:59:03.6848054Z detected during: 2025-05-07T19:59:03.6862146Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=10, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple, cute::C<1>, cute::C<1>>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T19:59:03.6890799Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T19:59:03.6919483Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T19:59:03.6954168Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T19:59:03.6975855Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=128, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_t.cu 2025-05-07T19:59:03.6980484Z 2025-05-07T19:59:03.6992046Z [119/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_f.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_f.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_f.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_f.cu.o 2025-05-07T19:59:03.7016891Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:59:03.7022418Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:59:03.7023717Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:59:03.7024190Z ^ 2025-05-07T19:59:03.7024365Z 2025-05-07T19:59:03.7024621Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:59:03.7024990Z 2025-05-07T19:59:03.7025838Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:59:03.7027004Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T19:59:03.7035714Z ^ 2025-05-07T19:59:03.7035892Z 2025-05-07T19:59:03.7040918Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:59:03.7045421Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:59:03.7050315Z ^ 2025-05-07T19:59:03.7050581Z detected during: 2025-05-07T19:59:03.7076232Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T19:59:03.7104959Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T19:59:03.7139040Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T19:59:03.7155689Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=128, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_f.cu 2025-05-07T19:59:03.7156865Z 2025-05-07T19:59:03.7157144Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:59:03.7157511Z 2025-05-07T19:59:03.7158326Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:59:03.7159472Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:59:03.7164003Z ^ 2025-05-07T19:59:03.7164252Z detected during: 2025-05-07T19:59:03.7178259Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=10, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple, cute::C<1>, cute::C<1>>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T19:59:03.7212600Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T19:59:03.7246190Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T19:59:03.7279471Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T19:59:03.7299580Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=128, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_f.cu 2025-05-07T19:59:03.7300907Z 2025-05-07T19:59:03.7301722Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:59:03.7302891Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:59:03.7303361Z ^ 2025-05-07T19:59:03.7303635Z detected during: 2025-05-07T19:59:03.7330500Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T19:59:03.7358929Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T19:59:03.7387699Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T19:59:03.7404253Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=128, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_f.cu 2025-05-07T19:59:03.7405418Z 2025-05-07T19:59:03.7405688Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:59:03.7406147Z 2025-05-07T19:59:03.7406957Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:59:03.7415914Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:59:03.7416331Z ^ 2025-05-07T19:59:03.7425991Z detected during: 2025-05-07T19:59:03.7440141Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=10, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple, cute::C<1>, cute::C<1>>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T19:59:03.7480084Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T19:59:03.7508634Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T19:59:03.7537484Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T19:59:03.7553985Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=128, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_f.cu 2025-05-07T19:59:03.7555207Z 2025-05-07T19:59:03.7556017Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:59:03.7557188Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:59:03.7557662Z ^ 2025-05-07T19:59:03.7557934Z detected during: 2025-05-07T19:59:03.7572798Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T19:59:03.7601437Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T19:59:03.7630289Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T19:59:03.7646677Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=128, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_f.cu 2025-05-07T19:59:03.7647829Z 2025-05-07T19:59:03.7648084Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:59:03.7648472Z 2025-05-07T19:59:03.7649280Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:59:03.7650422Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:59:03.7650833Z ^ 2025-05-07T19:59:03.7651083Z detected during: 2025-05-07T19:59:03.7665241Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=10, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple, cute::C<1>, cute::C<1>>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T19:59:03.7693921Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T19:59:03.7722480Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T19:59:03.7751449Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T19:59:03.7767868Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=128, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_f.cu 2025-05-07T19:59:03.7769043Z 2025-05-07T19:59:03.7770298Z ptxas /tmp/tmpxft_00007e68_00000000-9_f4f4bf16_128_128_4_1_1_f.compute_90.ptx, line 925; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T19:59:03.7772870Z ptxas /tmp/tmpxft_00007e68_00000000-9_f4f4bf16_128_128_4_1_1_f.compute_90.ptx, line 937; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T19:59:03.7775455Z ptxas /tmp/tmpxft_00007e68_00000000-9_f4f4bf16_128_128_4_1_1_f.compute_90.ptx, line 1076; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T19:59:03.7778085Z ptxas /tmp/tmpxft_00007e68_00000000-9_f4f4bf16_128_128_4_1_1_f.compute_90.ptx, line 1088; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T19:59:03.7780296Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:59:03.7781475Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:59:03.7781937Z ^ 2025-05-07T19:59:03.7782227Z detected during: 2025-05-07T19:59:03.7797087Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T19:59:03.7825711Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T19:59:03.7854478Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T19:59:03.7870904Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=128, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_f.cu 2025-05-07T19:59:03.7872080Z 2025-05-07T19:59:03.7872339Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:59:03.7872705Z 2025-05-07T19:59:03.7873528Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:59:03.7874652Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:59:03.7875080Z ^ 2025-05-07T19:59:03.7875312Z detected during: 2025-05-07T19:59:03.7889352Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=10, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple, cute::C<1>, cute::C<1>>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T19:59:03.7918602Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T19:59:03.7947084Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T19:59:03.7975831Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T19:59:03.7992280Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=128, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_f.cu 2025-05-07T19:59:03.7993442Z 2025-05-07T19:59:03.7994267Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:59:03.7995426Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:59:03.7995899Z ^ 2025-05-07T19:59:03.7996191Z detected during: 2025-05-07T19:59:03.8011453Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T19:59:03.8039915Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T19:59:03.8068894Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T19:59:03.8085305Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=128, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_f.cu 2025-05-07T19:59:03.8086468Z 2025-05-07T19:59:03.8086751Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:59:03.8087117Z 2025-05-07T19:59:03.8087926Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:59:03.8089134Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:59:03.8089575Z ^ 2025-05-07T19:59:03.8089809Z detected during: 2025-05-07T19:59:03.8104170Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=10, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple, cute::C<1>, cute::C<1>>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T19:59:03.8133082Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T19:59:03.8162062Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T19:59:03.8191081Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T19:59:03.8211971Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=128, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_f.cu 2025-05-07T19:59:03.8213155Z 2025-05-07T19:59:03.8214086Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:59:03.8215293Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:59:03.8215789Z ^ 2025-05-07T19:59:03.8216083Z detected during: 2025-05-07T19:59:03.8231209Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T19:59:03.8259749Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T19:59:03.8292702Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T19:59:03.8309381Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=128, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_f.cu 2025-05-07T19:59:03.8310535Z 2025-05-07T19:59:03.8310785Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:59:03.8311157Z 2025-05-07T19:59:03.8311961Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:59:03.8313085Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:59:03.8313489Z ^ 2025-05-07T19:59:03.8313728Z detected during: 2025-05-07T19:59:03.8327822Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=10, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple, cute::C<1>, cute::C<1>>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T19:59:03.8356545Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T19:59:03.8384869Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T19:59:03.8413793Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T19:59:03.8430160Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=128, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_f.cu 2025-05-07T19:59:03.8431329Z 2025-05-07T19:59:10.3392519Z [120/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_example_py_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/example/CMakeFiles/fbgemm_gpu_experimental_example_py.dir/src/cutlass_sgemm_nn.cu.o -MF experimental/example/CMakeFiles/fbgemm_gpu_experimental_example_py.dir/src/cutlass_sgemm_nn.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/example/src/cutlass_sgemm_nn.cu -o experimental/example/CMakeFiles/fbgemm_gpu_experimental_example_py.dir/src/cutlass_sgemm_nn.cu.o 2025-05-07T19:59:10.3405018Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:59:11.0558994Z [121/156] : && /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -fPIC -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -Wl,-O2 -Wl,--sort-common -Wl,--as-needed -Wl,-z,relro -Wl,-z,now -Wl,--disable-new-dtags -Wl,--gc-sections -Wl,--allow-shlib-undefined -Wl,-rpath,/github/home/miniconda/envs/build_binary/lib -Wl,-rpath-link,/github/home/miniconda/envs/build_binary/lib -L/github/home/miniconda/envs/build_binary/lib -L/github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib -L/github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/stubs -s -shared -Wl,-soname,fbgemm_gpu_experimental_example_py.so -o experimental/example/fbgemm_gpu_experimental_example_py.so experimental/example/CMakeFiles/fbgemm_gpu_experimental_example_py.dir/src/example_nccl.cpp.o experimental/example/CMakeFiles/fbgemm_gpu_experimental_example_py.dir/src/example_ops.cpp.o experimental/example/CMakeFiles/fbgemm_gpu_experimental_example_py.dir/src/cutlass_sgemm_nn.cu.o -L/lib/intel64 -L/lib/intel64_win -L/lib/win-x64 -Wl,-rpath,/lib/intel64:/lib/intel64_win:/lib/win-x64:/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/lib:/github/home/miniconda/envs/build_binary/lib/stubs: /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/lib/libtorch.so /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/lib/libc10.so /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/libnvrtc.so /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/lib/libc10_cuda.so /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib/libnccl.so.2 /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/stubs/libcuda.so /github/home/miniconda/envs/build_binary/lib/stubs/libnvidia-ml.so -Wl,--no-as-needed,"/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/lib/libtorch_cpu.so" -Wl,--as-needed -Wl,--no-as-needed,"/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/lib/libtorch_cuda.so" -Wl,--as-needed /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/lib/libc10_cuda.so /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/lib/libc10.so /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/libcudart.so -Wl,--no-as-needed,"/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/lib/libtorch.so" -Wl,--as-needed -lcudadevrt -lcudart_static -lrt -lpthread -ldl && : 2025-05-07T19:59:11.0817985Z [122/156] cd /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.13/cmake-build/experimental/example && bash /__w/FBGEMM/FBGEMM/fbgemm_gpu/../.github/scripts/fbgemm_gpu_postbuild.bash /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.13/cmake-build/experimental/example/fbgemm_gpu_experimental_example_py.so 2025-05-07T19:59:11.0819769Z ################################################################################ 2025-05-07T19:59:11.0820146Z [CMAKE] Running post-build script ... 2025-05-07T19:59:11.0820922Z Target file: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.13/cmake-build/experimental/example/fbgemm_gpu_experimental_example_py.so 2025-05-07T19:59:11.0821667Z Removing all RPATHs ... 2025-05-07T19:59:11.0822002Z ################################################################################ 2025-05-07T19:59:16.0756913Z [123/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/mixed_dtype_utils.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/mixed_dtype_utils.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/mixed_dtype_utils.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/mixed_dtype_utils.cu.o 2025-05-07T19:59:16.0769131Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:59:35.1087306Z [124/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/bf16i4bf16_shuffled_grouped.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/bf16i4bf16_shuffled_grouped.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/bf16i4bf16_shuffled_grouped.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/bf16i4bf16_shuffled_grouped.cu.o 2025-05-07T19:59:35.1100097Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:59:35.1101963Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:59:35.1103187Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:59:35.1103686Z ^ 2025-05-07T19:59:35.1103870Z 2025-05-07T19:59:35.1104129Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:59:35.1104527Z 2025-05-07T19:59:35.1105368Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:59:35.1106579Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T19:59:35.1107066Z ^ 2025-05-07T19:59:35.1107270Z 2025-05-07T19:59:39.0249352Z [125/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_tensorwise.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_tensorwise.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_tensorwise.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_tensorwise.cu.o 2025-05-07T19:59:39.0264854Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:59:39.0266440Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:59:39.0267635Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:59:39.0268095Z ^ 2025-05-07T19:59:39.0268299Z 2025-05-07T19:59:39.0268564Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:59:39.0268931Z 2025-05-07T19:59:39.0269784Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:59:39.0270972Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T19:59:39.0271448Z ^ 2025-05-07T19:59:39.0271629Z 2025-05-07T19:59:56.5494182Z [126/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_t.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_t.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_t.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_t.cu.o 2025-05-07T19:59:56.5508157Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:59:56.5509805Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:59:56.5510968Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:59:56.5511450Z ^ 2025-05-07T19:59:56.5511635Z 2025-05-07T19:59:56.5511894Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:59:56.5512294Z 2025-05-07T19:59:56.5513132Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:59:56.5514331Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T19:59:56.5514788Z ^ 2025-05-07T19:59:56.5514984Z 2025-05-07T19:59:56.5515786Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:59:56.5516966Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:59:56.5517417Z ^ 2025-05-07T19:59:56.5517709Z detected during: 2025-05-07T19:59:56.5532602Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T19:59:56.5561052Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T19:59:56.5589566Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T19:59:56.5606038Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_t.cu 2025-05-07T19:59:56.5607224Z 2025-05-07T19:59:56.5607481Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:59:56.5607871Z 2025-05-07T19:59:56.5608686Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:59:56.5609845Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:59:56.5610275Z ^ 2025-05-07T19:59:56.5610538Z detected during: 2025-05-07T19:59:56.5624817Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T19:59:56.5653478Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T19:59:56.5681463Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T19:59:56.5711260Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T19:59:56.5727508Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_t.cu 2025-05-07T19:59:56.5728689Z 2025-05-07T19:59:56.5729506Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:59:56.5730788Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:59:56.5731248Z ^ 2025-05-07T19:59:56.5731555Z detected during: 2025-05-07T19:59:56.5746413Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T19:59:56.5774540Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T19:59:56.5803253Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T19:59:56.5819532Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_t.cu 2025-05-07T19:59:56.5820729Z 2025-05-07T19:59:56.5820991Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:59:56.5821371Z 2025-05-07T19:59:56.5822212Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:59:56.5823512Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:59:56.5823983Z ^ 2025-05-07T19:59:56.5824235Z detected during: 2025-05-07T19:59:56.5838419Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T19:59:56.5867291Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T19:59:56.6135339Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T19:59:56.6164870Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T19:59:56.6217619Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_t.cu 2025-05-07T19:59:56.6219227Z 2025-05-07T19:59:56.6220221Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:59:56.6221421Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:59:56.6221922Z ^ 2025-05-07T19:59:56.6222219Z detected during: 2025-05-07T19:59:56.6288579Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T19:59:56.6317514Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T19:59:56.6346145Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T19:59:56.6362386Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_t.cu 2025-05-07T19:59:56.6363564Z 2025-05-07T19:59:56.6363930Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:59:56.6364312Z 2025-05-07T19:59:56.6365138Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:59:56.6366318Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:59:56.6366747Z ^ 2025-05-07T19:59:56.6367034Z detected during: 2025-05-07T19:59:56.6381249Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T19:59:56.6410051Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T19:59:56.6438294Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T19:59:56.6466808Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T19:59:56.6483114Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_t.cu 2025-05-07T19:59:56.6484272Z 2025-05-07T19:59:56.6485096Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:59:56.6486304Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:59:56.6486822Z ^ 2025-05-07T19:59:56.6487120Z detected during: 2025-05-07T19:59:56.6502147Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T19:59:56.6530224Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T19:59:56.6558713Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T19:59:56.6575001Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_t.cu 2025-05-07T19:59:56.6576196Z 2025-05-07T19:59:56.6576462Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:59:56.6576868Z 2025-05-07T19:59:56.6577687Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:59:56.6578870Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:59:56.6579309Z ^ 2025-05-07T19:59:56.6579644Z detected during: 2025-05-07T19:59:56.6593804Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T19:59:56.6622707Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T19:59:56.6650860Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T19:59:56.6679382Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T19:59:56.6696313Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_t.cu 2025-05-07T19:59:56.6697482Z 2025-05-07T19:59:56.6698325Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:59:56.6699545Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:59:56.6700044Z ^ 2025-05-07T19:59:56.6700528Z detected during: 2025-05-07T19:59:56.6715310Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T19:59:56.6743318Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T19:59:56.6771892Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T19:59:56.6788090Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_t.cu 2025-05-07T19:59:56.6789247Z 2025-05-07T19:59:56.6789528Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:59:56.6789899Z 2025-05-07T19:59:56.6790710Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:59:56.6791856Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:59:56.6792300Z ^ 2025-05-07T19:59:56.6792540Z detected during: 2025-05-07T19:59:56.6806869Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T19:59:56.6835541Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T19:59:56.6863579Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T19:59:56.6891963Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T19:59:56.6908322Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_t.cu 2025-05-07T19:59:56.6909477Z 2025-05-07T19:59:56.6910281Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:59:56.6911434Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:59:56.6911888Z ^ 2025-05-07T19:59:56.6912166Z detected during: 2025-05-07T19:59:56.6926948Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T19:59:56.6955069Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T19:59:56.6983616Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T19:59:56.6999737Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_t.cu 2025-05-07T19:59:56.7001053Z 2025-05-07T19:59:56.7001301Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:59:56.7001679Z 2025-05-07T19:59:56.7002479Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:59:56.7003613Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:59:56.7004015Z ^ 2025-05-07T19:59:56.7004253Z detected during: 2025-05-07T19:59:56.7018308Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T19:59:56.7046872Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T19:59:56.7074890Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T19:59:56.7103517Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T19:59:56.7119936Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_t.cu 2025-05-07T19:59:56.7121106Z 2025-05-07T19:59:57.6907051Z [127/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_f.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_f.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_f.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_f.cu.o 2025-05-07T19:59:57.6920022Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:59:57.6921663Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:59:57.6922860Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:59:57.6923328Z ^ 2025-05-07T19:59:57.6923531Z 2025-05-07T19:59:57.6923790Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:59:57.6924169Z 2025-05-07T19:59:57.6925015Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:59:57.6926198Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T19:59:57.6926676Z ^ 2025-05-07T19:59:57.6926855Z 2025-05-07T19:59:57.6927659Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:59:57.6928843Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:59:57.6929328Z ^ 2025-05-07T19:59:57.6929609Z detected during: 2025-05-07T19:59:57.6944614Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T19:59:57.6972749Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T19:59:57.7001710Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T19:59:57.7018244Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_f.cu 2025-05-07T19:59:57.7019482Z 2025-05-07T19:59:57.7019743Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:59:57.7020146Z 2025-05-07T19:59:57.7020962Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:59:57.7022132Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:59:57.7022661Z ^ 2025-05-07T19:59:57.7022932Z detected during: 2025-05-07T19:59:57.7037002Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T19:59:57.7065650Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T19:59:57.7124866Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T19:59:57.7154053Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T19:59:57.7170777Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_f.cu 2025-05-07T19:59:57.7171987Z 2025-05-07T19:59:57.7172811Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:59:57.7174046Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:59:57.7174535Z ^ 2025-05-07T19:59:57.7174815Z detected during: 2025-05-07T19:59:57.7190233Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T19:59:57.7218627Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T19:59:57.7247133Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T19:59:57.7263330Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_f.cu 2025-05-07T19:59:57.7264516Z 2025-05-07T19:59:57.7264774Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:59:57.7265218Z 2025-05-07T19:59:57.7266031Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:59:57.7267167Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:59:57.7267604Z ^ 2025-05-07T19:59:57.7267868Z detected during: 2025-05-07T19:59:57.7281945Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T19:59:57.7311041Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T19:59:57.7339348Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T19:59:57.7367862Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T19:59:57.7384124Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_f.cu 2025-05-07T19:59:57.7385324Z 2025-05-07T19:59:57.7386219Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:59:57.7387404Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:59:57.7387871Z ^ 2025-05-07T19:59:57.7388179Z detected during: 2025-05-07T19:59:57.7403070Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T19:59:57.7430993Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T19:59:57.7459525Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T19:59:57.7475724Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_f.cu 2025-05-07T19:59:57.7476954Z 2025-05-07T19:59:57.7477218Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:59:57.7477598Z 2025-05-07T19:59:57.7478432Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:59:57.7480360Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:59:57.7480813Z ^ 2025-05-07T19:59:57.7481065Z detected during: 2025-05-07T19:59:57.7495120Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T19:59:57.7524039Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T19:59:57.7552057Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T19:59:57.7580548Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T19:59:57.7596889Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_f.cu 2025-05-07T19:59:57.7598063Z 2025-05-07T19:59:57.7598903Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:59:57.7600068Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:59:57.7600765Z ^ 2025-05-07T19:59:57.7601049Z detected during: 2025-05-07T19:59:57.7615896Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T19:59:57.7644048Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T19:59:57.7672860Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T19:59:57.7689062Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_f.cu 2025-05-07T19:59:57.7690258Z 2025-05-07T19:59:57.7690543Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:59:57.7690911Z 2025-05-07T19:59:57.7691724Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:59:57.7692883Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:59:57.7693317Z ^ 2025-05-07T19:59:57.7693592Z detected during: 2025-05-07T19:59:57.7708045Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T19:59:57.7736624Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T19:59:57.7764501Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T19:59:57.7793059Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T19:59:57.7809429Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_f.cu 2025-05-07T19:59:57.7810592Z 2025-05-07T19:59:57.7811409Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:59:57.7812735Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:59:57.7813223Z ^ 2025-05-07T19:59:57.7813506Z detected during: 2025-05-07T19:59:57.7828712Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T19:59:57.7856739Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T19:59:57.7885166Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T19:59:57.7901655Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_f.cu 2025-05-07T19:59:57.7902856Z 2025-05-07T19:59:57.7903117Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:59:57.7903507Z 2025-05-07T19:59:57.7904316Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:59:57.7905447Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:59:57.7905888Z ^ 2025-05-07T19:59:57.7906168Z detected during: 2025-05-07T19:59:57.7920285Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T19:59:57.7948934Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T19:59:57.7976969Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T19:59:57.8005915Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T19:59:57.8022245Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_f.cu 2025-05-07T19:59:57.8023415Z 2025-05-07T19:59:57.8024233Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:59:57.8025422Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:59:57.8025914Z ^ 2025-05-07T19:59:57.8026198Z detected during: 2025-05-07T19:59:57.8040907Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T19:59:57.8068862Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T19:59:57.8097337Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T19:59:57.8113880Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_f.cu 2025-05-07T19:59:57.8115248Z 2025-05-07T19:59:57.8115606Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:59:57.8116162Z 2025-05-07T19:59:57.8117522Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:59:57.8118899Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:59:57.8119335Z ^ 2025-05-07T19:59:57.8119598Z detected during: 2025-05-07T19:59:57.8133809Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T19:59:57.8163739Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T19:59:57.8191817Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T19:59:57.8220586Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T19:59:57.8236791Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_f.cu 2025-05-07T19:59:57.8237978Z 2025-05-07T20:00:00.5953494Z [128/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/f8f8bf16_rowwise_batched_impl.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/f8f8bf16_rowwise_batched_impl.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/f8f8bf16_rowwise_batched_impl.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/f8f8bf16_rowwise_batched_impl.cu.o 2025-05-07T20:00:00.5966666Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:00:00.5968250Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:00.5969415Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:00.5969876Z ^ 2025-05-07T20:00:00.5970048Z 2025-05-07T20:00:00.5970298Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:00.5970686Z 2025-05-07T20:00:00.5971513Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:00.5972664Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:00:00.5973108Z ^ 2025-05-07T20:00:00.5973273Z 2025-05-07T20:00:02.6314483Z [129/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_t.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_t.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_t.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_t.cu.o 2025-05-07T20:00:02.6327619Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:00:02.6329240Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:02.6330433Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:02.6330896Z ^ 2025-05-07T20:00:02.6331077Z 2025-05-07T20:00:02.6331361Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:02.6331733Z 2025-05-07T20:00:02.6332570Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:02.6333783Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:00:02.6334259Z ^ 2025-05-07T20:00:02.6334436Z 2025-05-07T20:00:02.6335243Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:02.6336428Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:02.6336884Z ^ 2025-05-07T20:00:02.6337181Z detected during: 2025-05-07T20:00:02.6352141Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:02.6380359Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:02.6409127Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:02.6425598Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_t.cu 2025-05-07T20:00:02.6426785Z 2025-05-07T20:00:02.6427044Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:02.6427418Z 2025-05-07T20:00:02.6428259Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:02.6429391Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:02.6429835Z ^ 2025-05-07T20:00:02.6430107Z detected during: 2025-05-07T20:00:02.6444271Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, cute::C<1>, cute::C<1>>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:00:02.6473175Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:02.6501509Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:02.6530071Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:02.6546457Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_t.cu 2025-05-07T20:00:02.6547620Z 2025-05-07T20:00:02.6548465Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:02.6549680Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:02.6550172Z ^ 2025-05-07T20:00:02.6550458Z detected during: 2025-05-07T20:00:02.6565181Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:02.6593280Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:02.6622028Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:02.6638252Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_t.cu 2025-05-07T20:00:02.6639413Z 2025-05-07T20:00:02.6639698Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:02.6640070Z 2025-05-07T20:00:02.6640946Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:02.6642108Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:02.6642529Z ^ 2025-05-07T20:00:02.6642803Z detected during: 2025-05-07T20:00:02.6656924Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, cute::C<1>, cute::C<1>>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:00:02.6685637Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:02.6714122Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:02.6742551Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:02.6758705Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_t.cu 2025-05-07T20:00:02.6759858Z 2025-05-07T20:00:02.6760663Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:02.6761862Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:02.6762338Z ^ 2025-05-07T20:00:02.6762615Z detected during: 2025-05-07T20:00:02.6777407Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:02.6805751Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:02.6834192Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:02.6850393Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_t.cu 2025-05-07T20:00:02.6851555Z 2025-05-07T20:00:02.6851807Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:02.6852183Z 2025-05-07T20:00:02.6852987Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:02.6854120Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:02.6854527Z ^ 2025-05-07T20:00:02.6854761Z detected during: 2025-05-07T20:00:02.6868866Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, cute::C<1>, cute::C<1>>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:00:02.6897416Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:02.6925726Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:02.6954266Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:02.6970426Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_t.cu 2025-05-07T20:00:02.6971618Z 2025-05-07T20:00:02.6972423Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:02.6973585Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:02.6974026Z ^ 2025-05-07T20:00:02.6974302Z detected during: 2025-05-07T20:00:02.6989049Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:02.7017342Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:02.7045835Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:02.7062076Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_t.cu 2025-05-07T20:00:02.7063254Z 2025-05-07T20:00:02.7063514Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:02.7063884Z 2025-05-07T20:00:02.7064731Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:02.7065869Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:02.7066309Z ^ 2025-05-07T20:00:02.7066548Z detected during: 2025-05-07T20:00:02.7080606Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, cute::C<1>, cute::C<1>>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:00:02.7109390Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:02.7137461Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:02.7166095Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:02.7182508Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_t.cu 2025-05-07T20:00:02.7183684Z 2025-05-07T20:00:02.7184532Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:02.7185718Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:02.7186220Z ^ 2025-05-07T20:00:02.7186501Z detected during: 2025-05-07T20:00:02.7201623Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:02.7229940Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:02.7258709Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:02.7275006Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_t.cu 2025-05-07T20:00:02.7276161Z 2025-05-07T20:00:02.7276441Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:02.7276816Z 2025-05-07T20:00:02.7277636Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:02.7278788Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:02.7279238Z ^ 2025-05-07T20:00:02.7279492Z detected during: 2025-05-07T20:00:02.7293636Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, cute::C<1>, cute::C<1>>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:00:02.7322616Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:02.7350771Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:02.7379595Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:02.7395880Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_t.cu 2025-05-07T20:00:02.7397036Z 2025-05-07T20:00:02.7397851Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:02.7399041Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:02.7399525Z ^ 2025-05-07T20:00:02.7399803Z detected during: 2025-05-07T20:00:02.7414754Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:02.7442920Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:02.7471398Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:02.7487630Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_t.cu 2025-05-07T20:00:02.7488794Z 2025-05-07T20:00:02.7489081Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:02.7489478Z 2025-05-07T20:00:02.7490294Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:02.7491439Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:02.7491857Z ^ 2025-05-07T20:00:02.7492121Z detected during: 2025-05-07T20:00:02.7506441Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, cute::C<1>, cute::C<1>>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:00:02.7535200Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:02.7563260Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:02.7592243Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:02.7608554Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_t.cu 2025-05-07T20:00:02.7609738Z 2025-05-07T20:00:04.7908601Z [130/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_t.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_t.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_t.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_t.cu.o 2025-05-07T20:00:04.7921370Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:00:04.7922986Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:04.7924175Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:04.7924665Z ^ 2025-05-07T20:00:04.7924847Z 2025-05-07T20:00:04.7925107Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:04.7925484Z 2025-05-07T20:00:04.7926333Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:04.7927523Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:00:04.7928015Z ^ 2025-05-07T20:00:04.7928194Z 2025-05-07T20:00:04.7929101Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:04.7930251Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:04.7930735Z ^ 2025-05-07T20:00:04.7931016Z detected during: 2025-05-07T20:00:04.7946010Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:04.7974163Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:04.8003131Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:04.8019532Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_t.cu 2025-05-07T20:00:04.8020744Z 2025-05-07T20:00:04.8021030Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:04.8021400Z 2025-05-07T20:00:04.8022215Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:04.8023397Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:04.8023818Z ^ 2025-05-07T20:00:04.8024080Z detected during: 2025-05-07T20:00:04.8038264Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, cute::C<4>, cute::C<1>>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:00:04.8067084Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:04.8095300Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:04.8124120Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:04.8140522Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_t.cu 2025-05-07T20:00:04.8141689Z 2025-05-07T20:00:04.8142500Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:04.8143684Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:04.8144163Z ^ 2025-05-07T20:00:04.8144439Z detected during: 2025-05-07T20:00:04.8160008Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:04.8188231Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:04.8217072Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:04.8233366Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_t.cu 2025-05-07T20:00:04.8234598Z 2025-05-07T20:00:04.8234858Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:04.8235257Z 2025-05-07T20:00:04.8236065Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:04.8237196Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:04.8237632Z ^ 2025-05-07T20:00:04.8237905Z detected during: 2025-05-07T20:00:04.8252069Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, cute::C<4>, cute::C<1>>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:00:04.8280818Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:04.8309245Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:04.8337923Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:04.8354223Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_t.cu 2025-05-07T20:00:04.8355405Z 2025-05-07T20:00:04.8356216Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:04.8357396Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:04.8357852Z ^ 2025-05-07T20:00:04.8358154Z detected during: 2025-05-07T20:00:04.8372923Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:04.8401255Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:04.8429948Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:04.8446227Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_t.cu 2025-05-07T20:00:04.8447397Z 2025-05-07T20:00:04.8447656Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:04.8448027Z 2025-05-07T20:00:04.8448862Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:04.8449987Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:04.8450429Z ^ 2025-05-07T20:00:04.8450667Z detected during: 2025-05-07T20:00:04.8464980Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, cute::C<4>, cute::C<1>>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:00:04.8494585Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:04.8523320Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:04.8552078Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:04.8568496Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_t.cu 2025-05-07T20:00:04.8569655Z 2025-05-07T20:00:04.8570493Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:04.8571699Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:04.8572178Z ^ 2025-05-07T20:00:04.8572454Z detected during: 2025-05-07T20:00:04.8587386Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:04.8615785Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:04.8644466Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:04.8660758Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_t.cu 2025-05-07T20:00:04.8661919Z 2025-05-07T20:00:04.8662191Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:04.8662560Z 2025-05-07T20:00:04.8663374Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:04.8664595Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:04.8665023Z ^ 2025-05-07T20:00:04.8665287Z detected during: 2025-05-07T20:00:04.8679458Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, cute::C<4>, cute::C<1>>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:00:04.8708379Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:04.8736537Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:04.8765326Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:04.8781597Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_t.cu 2025-05-07T20:00:04.8782818Z 2025-05-07T20:00:04.8783643Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:04.8784824Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:04.8785332Z ^ 2025-05-07T20:00:04.8785609Z detected during: 2025-05-07T20:00:04.8800669Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:04.8829878Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:04.8858540Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:04.8874865Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_t.cu 2025-05-07T20:00:04.8876053Z 2025-05-07T20:00:04.8876312Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:04.8876709Z 2025-05-07T20:00:04.8877523Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:04.8878676Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:04.8879104Z ^ 2025-05-07T20:00:04.8879369Z detected during: 2025-05-07T20:00:04.8893528Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, cute::C<4>, cute::C<1>>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:00:04.8922705Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:04.8950863Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:04.8979443Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:04.8995819Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_t.cu 2025-05-07T20:00:04.8997040Z 2025-05-07T20:00:04.8997856Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:04.8999045Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:04.8999506Z ^ 2025-05-07T20:00:04.8999816Z detected during: 2025-05-07T20:00:04.9014842Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:04.9043027Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:04.9071684Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:04.9087967Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_t.cu 2025-05-07T20:00:04.9089128Z 2025-05-07T20:00:04.9089387Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:04.9089779Z 2025-05-07T20:00:04.9090607Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:04.9091758Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:04.9092182Z ^ 2025-05-07T20:00:04.9092442Z detected during: 2025-05-07T20:00:04.9106948Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, cute::C<4>, cute::C<1>>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:00:04.9135730Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:04.9163921Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:04.9192776Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:04.9209240Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_t.cu 2025-05-07T20:00:04.9210419Z 2025-05-07T20:00:05.1760328Z [131/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_f.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_f.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_f.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_f.cu.o 2025-05-07T20:00:05.1809813Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:00:05.1812861Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:05.1814872Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:05.1815732Z ^ 2025-05-07T20:00:05.1816341Z 2025-05-07T20:00:05.1816776Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:05.1817406Z 2025-05-07T20:00:05.1818880Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:05.1821159Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:00:05.1821932Z ^ 2025-05-07T20:00:05.1822251Z 2025-05-07T20:00:05.1823769Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:05.1825808Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:05.1826442Z ^ 2025-05-07T20:00:05.1826875Z detected during: 2025-05-07T20:00:05.1853469Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:05.1904187Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:05.1954247Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:05.1983400Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_f.cu 2025-05-07T20:00:05.1985516Z 2025-05-07T20:00:05.1985949Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:05.1986582Z 2025-05-07T20:00:05.1987994Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:05.1990016Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:05.1990766Z ^ 2025-05-07T20:00:05.1991333Z detected during: 2025-05-07T20:00:05.2016689Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, cute::C<4>, cute::C<1>>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:00:05.2066863Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:05.2115953Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:05.2166632Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:05.2194593Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_f.cu 2025-05-07T20:00:05.2196748Z 2025-05-07T20:00:05.2198012Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:05.2200113Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:05.2201149Z ^ 2025-05-07T20:00:05.2201636Z detected during: 2025-05-07T20:00:05.2228205Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:05.2278462Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:05.2329795Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:05.2359075Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_f.cu 2025-05-07T20:00:05.2361359Z 2025-05-07T20:00:05.2361835Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:05.2362481Z 2025-05-07T20:00:05.2363889Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:05.2365948Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:05.2366702Z ^ 2025-05-07T20:00:05.2367065Z detected during: 2025-05-07T20:00:05.2414651Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, cute::C<4>, cute::C<1>>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:00:05.2466717Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:05.2517349Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:05.2568707Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:05.2598239Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_f.cu 2025-05-07T20:00:05.2600855Z 2025-05-07T20:00:05.2602313Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:05.2604381Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:05.2605243Z ^ 2025-05-07T20:00:05.2605713Z detected during: 2025-05-07T20:00:05.2632435Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:05.2682361Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:05.2734244Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:05.2762909Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_f.cu 2025-05-07T20:00:05.2764966Z 2025-05-07T20:00:05.2765408Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:05.2766260Z 2025-05-07T20:00:05.2767676Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:05.2769681Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:05.2770411Z ^ 2025-05-07T20:00:05.2770842Z detected during: 2025-05-07T20:00:05.2796178Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, cute::C<4>, cute::C<1>>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:00:05.2848034Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:05.2897670Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:05.2928702Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:05.2945231Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_f.cu 2025-05-07T20:00:05.2946436Z 2025-05-07T20:00:05.2947258Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:05.2948450Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:05.2948914Z ^ 2025-05-07T20:00:05.2949217Z detected during: 2025-05-07T20:00:05.2964105Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:05.2992572Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:05.3021596Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:05.3038012Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_f.cu 2025-05-07T20:00:05.3039224Z 2025-05-07T20:00:05.3039486Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:05.3039864Z 2025-05-07T20:00:05.3040826Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:05.3041958Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:05.3042410Z ^ 2025-05-07T20:00:05.3042685Z detected during: 2025-05-07T20:00:05.3056963Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, cute::C<4>, cute::C<1>>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:00:05.3085921Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:05.3114557Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:05.3143437Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:05.3159728Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_f.cu 2025-05-07T20:00:05.3160920Z 2025-05-07T20:00:05.3161734Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:05.3162915Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:05.3163376Z ^ 2025-05-07T20:00:05.3163674Z detected during: 2025-05-07T20:00:05.3178527Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:05.3207058Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:05.3237045Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:05.3253323Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_f.cu 2025-05-07T20:00:05.3254516Z 2025-05-07T20:00:05.3254775Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:05.3255145Z 2025-05-07T20:00:05.3255981Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:05.3257110Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:05.3257558Z ^ 2025-05-07T20:00:05.3257805Z detected during: 2025-05-07T20:00:05.3272098Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, cute::C<4>, cute::C<1>>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:00:05.3301052Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:05.3329185Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:05.3357837Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:05.3374072Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_f.cu 2025-05-07T20:00:05.3375248Z 2025-05-07T20:00:05.3376113Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:05.3377273Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:05.3377762Z ^ 2025-05-07T20:00:05.3378041Z detected during: 2025-05-07T20:00:05.3392954Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:05.3421331Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:05.3449835Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:05.3466248Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_f.cu 2025-05-07T20:00:05.3467416Z 2025-05-07T20:00:05.3467697Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:05.3468068Z 2025-05-07T20:00:05.3468944Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:05.3470098Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:05.3470521Z ^ 2025-05-07T20:00:05.3470779Z detected during: 2025-05-07T20:00:05.3484831Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, cute::C<4>, cute::C<1>>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:00:05.3513969Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:05.3542205Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:05.3571844Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:05.3588267Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_f.cu 2025-05-07T20:00:05.3589472Z 2025-05-07T20:00:06.0184744Z [132/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_f.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_f.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_f.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_f.cu.o 2025-05-07T20:00:06.0197563Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:00:06.0199188Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:06.0200593Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:06.0201098Z ^ 2025-05-07T20:00:06.0201331Z 2025-05-07T20:00:06.0201586Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:06.0201978Z 2025-05-07T20:00:06.0202807Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:06.0204109Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:00:06.0204653Z ^ 2025-05-07T20:00:06.0204832Z 2025-05-07T20:00:06.0205639Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:06.0206817Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:06.0207317Z ^ 2025-05-07T20:00:06.0207614Z detected during: 2025-05-07T20:00:06.0224341Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:06.0252595Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:06.0281309Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:06.0297645Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_f.cu 2025-05-07T20:00:06.0298833Z 2025-05-07T20:00:06.0299091Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:06.0299534Z 2025-05-07T20:00:06.0300522Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:06.0301690Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:06.0302115Z ^ 2025-05-07T20:00:06.0302381Z detected during: 2025-05-07T20:00:06.0316532Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, cute::C<1>, cute::C<1>>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:00:06.0345247Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:06.0373241Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:06.0401961Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:06.0418171Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_f.cu 2025-05-07T20:00:06.0419389Z 2025-05-07T20:00:06.0420254Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:06.0421435Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:06.0421895Z ^ 2025-05-07T20:00:06.0422194Z detected during: 2025-05-07T20:00:06.0436983Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:06.0465007Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:06.0493470Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:06.0509803Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_f.cu 2025-05-07T20:00:06.0511005Z 2025-05-07T20:00:06.0511263Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:06.0511636Z 2025-05-07T20:00:06.0512478Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:06.0513613Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:06.0514060Z ^ 2025-05-07T20:00:06.0514303Z detected during: 2025-05-07T20:00:06.0528490Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, cute::C<1>, cute::C<1>>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:00:06.0558090Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:06.0586276Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:06.0615012Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:06.0631329Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_f.cu 2025-05-07T20:00:06.0632497Z 2025-05-07T20:00:06.0633322Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:06.0634468Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:06.0634923Z ^ 2025-05-07T20:00:06.0635186Z detected during: 2025-05-07T20:00:06.0650004Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:06.0681252Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:06.0710376Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:06.0726769Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_f.cu 2025-05-07T20:00:06.0727957Z 2025-05-07T20:00:06.0728214Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:06.0728587Z 2025-05-07T20:00:06.0729426Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:06.0730559Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:06.0731058Z ^ 2025-05-07T20:00:06.0731299Z detected during: 2025-05-07T20:00:06.0745576Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, cute::C<1>, cute::C<1>>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:00:06.0774273Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:06.0802582Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:06.0831074Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:06.0847174Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_f.cu 2025-05-07T20:00:06.0848320Z 2025-05-07T20:00:06.0849149Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:06.0850299Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:06.0850754Z ^ 2025-05-07T20:00:06.0851030Z detected during: 2025-05-07T20:00:06.0867074Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:06.0895200Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:06.0923933Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:06.0940148Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_f.cu 2025-05-07T20:00:06.0941368Z 2025-05-07T20:00:06.0941651Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:06.0942025Z 2025-05-07T20:00:06.0942839Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:06.0943994Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:06.0944445Z ^ 2025-05-07T20:00:06.0944687Z detected during: 2025-05-07T20:00:06.0958740Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, cute::C<1>, cute::C<1>>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:00:06.0987531Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:06.1015739Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:06.1044285Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:06.1060526Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_f.cu 2025-05-07T20:00:06.1061687Z 2025-05-07T20:00:06.1062558Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:06.1063751Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:06.1064237Z ^ 2025-05-07T20:00:06.1064523Z detected during: 2025-05-07T20:00:06.1079310Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:06.1107697Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:06.1136246Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:06.1152557Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_f.cu 2025-05-07T20:00:06.1153718Z 2025-05-07T20:00:06.1153976Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:06.1154366Z 2025-05-07T20:00:06.1155182Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:06.1156369Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:06.1156786Z ^ 2025-05-07T20:00:06.1157080Z detected during: 2025-05-07T20:00:06.1171211Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, cute::C<1>, cute::C<1>>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:00:06.1201161Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:06.1229295Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:06.1257901Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:06.1274272Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_f.cu 2025-05-07T20:00:06.1275474Z 2025-05-07T20:00:06.1276292Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:06.1277490Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:06.1277952Z ^ 2025-05-07T20:00:06.1278258Z detected during: 2025-05-07T20:00:06.1293016Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:06.1321479Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:06.1350004Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:06.1366268Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_f.cu 2025-05-07T20:00:06.1367483Z 2025-05-07T20:00:06.1367739Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:06.1368110Z 2025-05-07T20:00:06.1368952Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:06.1370086Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:06.1370529Z ^ 2025-05-07T20:00:06.1370772Z detected during: 2025-05-07T20:00:06.1384902Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, cute::C<1>, cute::C<1>>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:00:06.1413858Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:06.1442136Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:06.1470914Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:06.1487196Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_f.cu 2025-05-07T20:00:06.1488404Z 2025-05-07T20:00:17.2228664Z [133/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_t.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_t.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_t.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_t.cu.o 2025-05-07T20:00:17.2241659Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:00:17.2243258Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:17.2244589Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:17.2245152Z ^ 2025-05-07T20:00:17.2245331Z 2025-05-07T20:00:17.2245586Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:17.2245985Z 2025-05-07T20:00:17.2246819Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:17.2248001Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:00:17.2248493Z ^ 2025-05-07T20:00:17.2248673Z 2025-05-07T20:00:17.2249510Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:17.2250671Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:17.2251158Z ^ 2025-05-07T20:00:17.2251439Z detected during: 2025-05-07T20:00:17.2266807Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:17.2297862Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:17.2327622Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:17.2344320Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=192, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_t.cu 2025-05-07T20:00:17.2345511Z 2025-05-07T20:00:17.2345772Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:17.2346164Z 2025-05-07T20:00:17.2346972Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:17.2348162Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:17.2348623Z ^ 2025-05-07T20:00:17.2348927Z detected during: 2025-05-07T20:00:17.2364114Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:17.2392908Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:17.2422180Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:17.2438692Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=192, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_t.cu 2025-05-07T20:00:17.2439911Z 2025-05-07T20:00:17.2440197Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:17.2440568Z 2025-05-07T20:00:17.2441387Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:17.2442584Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:17.2443069Z ^ 2025-05-07T20:00:17.2443352Z detected during: 2025-05-07T20:00:17.2458369Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:17.2487006Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:17.2516294Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:17.2533108Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=192, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_t.cu 2025-05-07T20:00:17.2534275Z 2025-05-07T20:00:17.2534536Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:17.2534929Z 2025-05-07T20:00:17.2536186Z ptxas /tmp/tmpxft_00007e79_00000000-9_f4f4bf16_128_192_2_2_1_t.compute_90.ptx, line 889; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:00:17.2538796Z ptxas /tmp/tmpxft_00007e79_00000000-9_f4f4bf16_128_192_2_2_1_t.compute_90.ptx, line 896; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:00:17.2541471Z ptxas /tmp/tmpxft_00007e79_00000000-9_f4f4bf16_128_192_2_2_1_t.compute_90.ptx, line 903; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:00:17.2544041Z ptxas /tmp/tmpxft_00007e79_00000000-9_f4f4bf16_128_192_2_2_1_t.compute_90.ptx, line 910; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:00:17.2546672Z ptxas /tmp/tmpxft_00007e79_00000000-9_f4f4bf16_128_192_2_2_1_t.compute_90.ptx, line 1044; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:00:17.2549281Z ptxas /tmp/tmpxft_00007e79_00000000-9_f4f4bf16_128_192_2_2_1_t.compute_90.ptx, line 1051; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:00:17.2551879Z ptxas /tmp/tmpxft_00007e79_00000000-9_f4f4bf16_128_192_2_2_1_t.compute_90.ptx, line 1058; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:00:17.2554440Z ptxas /tmp/tmpxft_00007e79_00000000-9_f4f4bf16_128_192_2_2_1_t.compute_90.ptx, line 1065; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:00:17.2556603Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:17.2557788Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:17.2558253Z ^ 2025-05-07T20:00:17.2558558Z detected during: 2025-05-07T20:00:17.2573597Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:17.2602536Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:17.2632431Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:17.2648888Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=192, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_t.cu 2025-05-07T20:00:17.2650109Z 2025-05-07T20:00:17.2650370Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:17.2650741Z 2025-05-07T20:00:17.2651584Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:17.2652750Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:17.2653267Z ^ 2025-05-07T20:00:17.2653562Z detected during: 2025-05-07T20:00:17.2668722Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:17.2697218Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:17.2726741Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:17.2743350Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=192, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_t.cu 2025-05-07T20:00:17.2744512Z 2025-05-07T20:00:17.2744793Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:17.2745162Z 2025-05-07T20:00:17.2745970Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:17.2747158Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:17.2747638Z ^ 2025-05-07T20:00:17.2747917Z detected during: 2025-05-07T20:00:17.2762944Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:17.2791620Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:17.2821122Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:17.2837776Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=192, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_t.cu 2025-05-07T20:00:17.2838963Z 2025-05-07T20:00:17.2839220Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:17.2839593Z 2025-05-07T20:00:19.1142718Z [134/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_f.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_f.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_f.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_f.cu.o 2025-05-07T20:00:19.1164775Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:00:19.1167522Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:19.1169509Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:19.1170482Z ^ 2025-05-07T20:00:19.1170767Z 2025-05-07T20:00:19.1171170Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:19.1171794Z 2025-05-07T20:00:19.1173169Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:19.1175181Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:00:19.1175948Z ^ 2025-05-07T20:00:19.1176250Z 2025-05-07T20:00:19.1177606Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:19.1179610Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:19.1180398Z ^ 2025-05-07T20:00:19.1180831Z detected during: 2025-05-07T20:00:19.1207752Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:19.1258325Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:19.1310025Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:19.1339182Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_f.cu 2025-05-07T20:00:19.1341316Z 2025-05-07T20:00:19.1341781Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:19.1342407Z 2025-05-07T20:00:19.1343804Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:19.1345804Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:19.1346494Z ^ 2025-05-07T20:00:19.1347050Z detected during: 2025-05-07T20:00:19.1372291Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:00:19.1423186Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:19.1471660Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:19.1521925Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:19.1550485Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_f.cu 2025-05-07T20:00:19.1552514Z 2025-05-07T20:00:19.1553902Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:19.1555850Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:19.1556650Z ^ 2025-05-07T20:00:19.1557080Z detected during: 2025-05-07T20:00:19.1583396Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:19.1631827Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:19.1683914Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:19.1712097Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_f.cu 2025-05-07T20:00:19.1714465Z 2025-05-07T20:00:19.1714895Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:19.1715558Z 2025-05-07T20:00:19.1716975Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:19.1718903Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:19.1719599Z ^ 2025-05-07T20:00:19.1719983Z detected during: 2025-05-07T20:00:19.1745029Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:00:19.1795423Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:19.1844460Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:19.1894632Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:19.1922745Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_f.cu 2025-05-07T20:00:19.1924816Z 2025-05-07T20:00:19.1926566Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:19.1928566Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:19.1929331Z ^ 2025-05-07T20:00:19.1929796Z detected during: 2025-05-07T20:00:19.1955732Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:19.2005329Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:19.2055726Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:19.2084653Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_f.cu 2025-05-07T20:00:19.2086570Z 2025-05-07T20:00:19.2086945Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:19.2087715Z 2025-05-07T20:00:19.2089109Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:19.2090982Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:19.2091688Z ^ 2025-05-07T20:00:19.2092023Z detected during: 2025-05-07T20:00:19.2117059Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:00:19.2167453Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:19.2216748Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:19.2269621Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:19.2298850Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_f.cu 2025-05-07T20:00:19.2301449Z 2025-05-07T20:00:19.2302993Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:19.2305035Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:19.2305809Z ^ 2025-05-07T20:00:19.2306230Z detected during: 2025-05-07T20:00:19.2332704Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:19.2381294Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:19.2431519Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:19.2460880Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_f.cu 2025-05-07T20:00:19.2463037Z 2025-05-07T20:00:19.2463533Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:19.2464203Z 2025-05-07T20:00:19.2465595Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:19.2467612Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:19.2468308Z ^ 2025-05-07T20:00:19.2468708Z detected during: 2025-05-07T20:00:19.2493158Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:00:19.2544444Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:19.2592773Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:19.2643186Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:19.2671786Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_f.cu 2025-05-07T20:00:19.2673806Z 2025-05-07T20:00:19.2675267Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:19.2677306Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:19.2678086Z ^ 2025-05-07T20:00:19.2678502Z detected during: 2025-05-07T20:00:19.2704685Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:19.2754983Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:19.2804632Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:19.2821356Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_f.cu 2025-05-07T20:00:19.2822542Z 2025-05-07T20:00:19.2822801Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:19.2823202Z 2025-05-07T20:00:19.2824014Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:19.2825155Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:19.2825628Z ^ 2025-05-07T20:00:19.2825893Z detected during: 2025-05-07T20:00:19.2840139Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:00:19.2869099Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:19.2898699Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:19.2927936Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:19.2944444Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_f.cu 2025-05-07T20:00:19.2945657Z 2025-05-07T20:00:19.2946486Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:19.2947736Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:19.2948209Z ^ 2025-05-07T20:00:19.2948513Z detected during: 2025-05-07T20:00:19.2963418Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:19.2991850Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:19.3020895Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:19.3037175Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_f.cu 2025-05-07T20:00:19.3038369Z 2025-05-07T20:00:19.3038632Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:19.3039011Z 2025-05-07T20:00:19.3039885Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:19.3041057Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:19.3041488Z ^ 2025-05-07T20:00:19.3041712Z detected during: 2025-05-07T20:00:19.3055775Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:00:19.3084621Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:19.3113056Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:19.3141658Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:19.3157953Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_f.cu 2025-05-07T20:00:19.3159117Z 2025-05-07T20:00:20.4814918Z [135/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_f.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_f.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_f.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_f.cu.o 2025-05-07T20:00:20.4827754Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:00:20.4829349Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:20.4830531Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:20.4830992Z ^ 2025-05-07T20:00:20.4831185Z 2025-05-07T20:00:20.4831440Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:20.4831799Z 2025-05-07T20:00:20.4832621Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:20.4833895Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:00:20.4834347Z ^ 2025-05-07T20:00:20.4834518Z 2025-05-07T20:00:20.4835320Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:20.4836474Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:20.4836924Z ^ 2025-05-07T20:00:20.4837217Z detected during: 2025-05-07T20:00:20.4852248Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:20.4882412Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:20.4911915Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:20.4928567Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=192, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_f.cu 2025-05-07T20:00:20.4929745Z 2025-05-07T20:00:20.4929989Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:20.4930365Z 2025-05-07T20:00:20.4931172Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:20.4932329Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:20.4932773Z ^ 2025-05-07T20:00:20.4933045Z detected during: 2025-05-07T20:00:20.4948170Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:20.4976793Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:20.5006193Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:20.5022773Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=192, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_f.cu 2025-05-07T20:00:20.5023943Z 2025-05-07T20:00:20.5024221Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:20.5024597Z 2025-05-07T20:00:20.5025418Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:20.5026601Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:20.5027084Z ^ 2025-05-07T20:00:20.5027358Z detected during: 2025-05-07T20:00:20.5042401Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:20.5071009Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:20.5100107Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:20.5116834Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=192, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_f.cu 2025-05-07T20:00:20.5117999Z 2025-05-07T20:00:20.5118251Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:20.5118643Z 2025-05-07T20:00:20.5119902Z ptxas /tmp/tmpxft_00007e73_00000000-9_f4f4bf16_128_192_2_2_1_f.compute_90.ptx, line 889; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:00:20.5122596Z ptxas /tmp/tmpxft_00007e73_00000000-9_f4f4bf16_128_192_2_2_1_f.compute_90.ptx, line 896; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:00:20.5125232Z ptxas /tmp/tmpxft_00007e73_00000000-9_f4f4bf16_128_192_2_2_1_f.compute_90.ptx, line 903; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:00:20.5127796Z ptxas /tmp/tmpxft_00007e73_00000000-9_f4f4bf16_128_192_2_2_1_f.compute_90.ptx, line 910; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:00:20.5130391Z ptxas /tmp/tmpxft_00007e73_00000000-9_f4f4bf16_128_192_2_2_1_f.compute_90.ptx, line 1044; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:00:20.5132983Z ptxas /tmp/tmpxft_00007e73_00000000-9_f4f4bf16_128_192_2_2_1_f.compute_90.ptx, line 1051; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:00:20.5135656Z ptxas /tmp/tmpxft_00007e73_00000000-9_f4f4bf16_128_192_2_2_1_f.compute_90.ptx, line 1058; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:00:20.5138249Z ptxas /tmp/tmpxft_00007e73_00000000-9_f4f4bf16_128_192_2_2_1_f.compute_90.ptx, line 1065; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:00:20.5140456Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:20.5141634Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:20.5142092Z ^ 2025-05-07T20:00:20.5142394Z detected during: 2025-05-07T20:00:20.5157444Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:20.5185968Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:20.5216478Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:20.5246506Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=192, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_f.cu 2025-05-07T20:00:20.5247683Z 2025-05-07T20:00:20.5247970Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:20.5248339Z 2025-05-07T20:00:20.5249154Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:20.5250339Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:20.5250824Z ^ 2025-05-07T20:00:20.5251099Z detected during: 2025-05-07T20:00:20.5266413Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:20.5294976Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:20.5324321Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:20.5340929Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=192, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_f.cu 2025-05-07T20:00:20.5342115Z 2025-05-07T20:00:20.5342369Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:20.5342796Z 2025-05-07T20:00:20.5343609Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:20.5344770Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:20.5345252Z ^ 2025-05-07T20:00:20.5345555Z detected during: 2025-05-07T20:00:20.5360537Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:20.5389253Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:20.5418533Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:20.5435162Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=192, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_f.cu 2025-05-07T20:00:20.5436333Z 2025-05-07T20:00:20.5436640Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:20.5437019Z 2025-05-07T20:00:22.2760795Z [136/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_f.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_f.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_f.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_f.cu.o 2025-05-07T20:00:22.2784881Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:00:22.2787663Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:22.2789769Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:22.2790532Z ^ 2025-05-07T20:00:22.2790859Z 2025-05-07T20:00:22.2791307Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:22.2791978Z 2025-05-07T20:00:22.2793442Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:22.2795479Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:00:22.2796539Z ^ 2025-05-07T20:00:22.2796845Z 2025-05-07T20:00:22.2798444Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:22.2800718Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:22.2801502Z ^ 2025-05-07T20:00:22.2801989Z detected during: 2025-05-07T20:00:22.2829740Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:22.2882159Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:22.2936280Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:22.2966741Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_f.cu 2025-05-07T20:00:22.2969076Z 2025-05-07T20:00:22.2969544Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:22.2970270Z 2025-05-07T20:00:22.2971667Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:22.2973820Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:22.2974622Z ^ 2025-05-07T20:00:22.2975103Z detected during: 2025-05-07T20:00:22.3002764Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:22.3058088Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:22.3111506Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:22.3141976Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_f.cu 2025-05-07T20:00:22.3144222Z 2025-05-07T20:00:22.3144743Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:22.3145410Z 2025-05-07T20:00:22.3146935Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:22.3149187Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:22.3150013Z ^ 2025-05-07T20:00:22.3150635Z detected during: 2025-05-07T20:00:22.3178143Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:22.3231599Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:22.3284750Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:22.3315268Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_f.cu 2025-05-07T20:00:22.3317352Z 2025-05-07T20:00:22.3317809Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:22.3318528Z 2025-05-07T20:00:22.3320270Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:22.3322458Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:22.3323273Z ^ 2025-05-07T20:00:22.3323739Z detected during: 2025-05-07T20:00:22.3351129Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:22.3403701Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:22.3456340Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:22.3486574Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_f.cu 2025-05-07T20:00:22.3488698Z 2025-05-07T20:00:22.3489245Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:22.3489882Z 2025-05-07T20:00:22.3491349Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:22.3493413Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:22.3494201Z ^ 2025-05-07T20:00:22.3494641Z detected during: 2025-05-07T20:00:22.3522658Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:22.3574757Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:22.3629449Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:22.3659483Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_f.cu 2025-05-07T20:00:22.3661596Z 2025-05-07T20:00:22.3662025Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:22.3662689Z 2025-05-07T20:00:22.3664127Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:22.3666203Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:22.3666978Z ^ 2025-05-07T20:00:22.3667441Z detected during: 2025-05-07T20:00:22.3694595Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:22.3746800Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:22.3799495Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:22.3829635Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_f.cu 2025-05-07T20:00:22.3831735Z 2025-05-07T20:00:22.3832173Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:22.3832809Z 2025-05-07T20:00:22.7488854Z [137/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_t.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_t.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_t.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_t.cu.o 2025-05-07T20:00:22.7512920Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:00:22.7516076Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:22.7518257Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:22.7519092Z ^ 2025-05-07T20:00:22.7519431Z 2025-05-07T20:00:22.7519881Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:22.7520562Z 2025-05-07T20:00:22.7522123Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:22.7524292Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:00:22.7525117Z ^ 2025-05-07T20:00:22.7525415Z 2025-05-07T20:00:22.7526911Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:22.7529110Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:22.7529948Z ^ 2025-05-07T20:00:22.7530402Z detected during: 2025-05-07T20:00:22.7559221Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:22.7611349Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:22.7665371Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:22.7695896Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_t.cu 2025-05-07T20:00:22.7698256Z 2025-05-07T20:00:22.7698748Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:22.7699542Z 2025-05-07T20:00:22.7701209Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:22.7703559Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:22.7704356Z ^ 2025-05-07T20:00:22.7704867Z detected during: 2025-05-07T20:00:22.7732401Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:22.7785959Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:22.7841603Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:22.7872693Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_t.cu 2025-05-07T20:00:22.7874915Z 2025-05-07T20:00:22.7875363Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:22.7876042Z 2025-05-07T20:00:22.7877527Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:22.7879622Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:22.7880401Z ^ 2025-05-07T20:00:22.7880859Z detected during: 2025-05-07T20:00:22.7908985Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:22.7962425Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:22.8017053Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:22.8048306Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_t.cu 2025-05-07T20:00:22.8050493Z 2025-05-07T20:00:22.8050950Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:22.8051629Z 2025-05-07T20:00:22.8053137Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:22.8055379Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:22.8056214Z ^ 2025-05-07T20:00:22.8056693Z detected during: 2025-05-07T20:00:22.8085352Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:22.8139402Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:22.8193420Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:22.8223690Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_t.cu 2025-05-07T20:00:22.8226102Z 2025-05-07T20:00:22.8226555Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:22.8227237Z 2025-05-07T20:00:22.8228774Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:22.8230959Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:22.8231785Z ^ 2025-05-07T20:00:22.8232361Z detected during: 2025-05-07T20:00:22.8260101Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:22.8314083Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:22.8367708Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:22.8398388Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_t.cu 2025-05-07T20:00:22.8400765Z 2025-05-07T20:00:22.8401226Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:22.8401918Z 2025-05-07T20:00:22.8403455Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:22.8405617Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:22.8406416Z ^ 2025-05-07T20:00:22.8406894Z detected during: 2025-05-07T20:00:22.8436322Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:22.8490024Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:22.8544537Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:22.8575305Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_t.cu 2025-05-07T20:00:22.8577484Z 2025-05-07T20:00:22.8577934Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:22.8578573Z 2025-05-07T20:00:25.2917159Z [138/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_f.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_f.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_f.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_f.cu.o 2025-05-07T20:00:25.2937057Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:00:25.2939823Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:25.2941830Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:25.2942770Z ^ 2025-05-07T20:00:25.2943053Z 2025-05-07T20:00:25.2943465Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:25.2944026Z 2025-05-07T20:00:25.2945164Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:25.2946843Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:00:25.2947463Z ^ 2025-05-07T20:00:25.2947691Z 2025-05-07T20:00:25.2948832Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:25.2950693Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:25.2951522Z ^ 2025-05-07T20:00:25.2951984Z detected during: 2025-05-07T20:00:25.2975874Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:25.3024060Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:25.3068774Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:25.3094153Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_f.cu 2025-05-07T20:00:25.3095917Z 2025-05-07T20:00:25.3096317Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:25.3096853Z 2025-05-07T20:00:25.3098063Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:25.3099959Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:25.3101048Z ^ 2025-05-07T20:00:25.3101481Z detected during: 2025-05-07T20:00:25.3124722Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:25.3168603Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:25.3213180Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:25.3238464Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_f.cu 2025-05-07T20:00:25.3240266Z 2025-05-07T20:00:25.3240626Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:25.3241201Z 2025-05-07T20:00:25.3242415Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:25.3244142Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:25.3244825Z ^ 2025-05-07T20:00:25.3245230Z detected during: 2025-05-07T20:00:25.3268283Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:25.3314245Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:25.3358803Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:25.3384034Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_f.cu 2025-05-07T20:00:25.3385882Z 2025-05-07T20:00:25.3386286Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:25.3386816Z 2025-05-07T20:00:25.3388038Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:25.3389799Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:25.3390488Z ^ 2025-05-07T20:00:25.3390917Z detected during: 2025-05-07T20:00:25.3414267Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:25.3458272Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:25.3503293Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:25.3527801Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_f.cu 2025-05-07T20:00:25.3529541Z 2025-05-07T20:00:25.3529924Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:25.3530458Z 2025-05-07T20:00:25.3531672Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:25.3533394Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:25.3534155Z ^ 2025-05-07T20:00:25.3534523Z detected during: 2025-05-07T20:00:25.3557249Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:25.3600529Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:25.3644315Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:25.3669083Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_f.cu 2025-05-07T20:00:25.3670859Z 2025-05-07T20:00:25.3671226Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:25.3671774Z 2025-05-07T20:00:25.3673123Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:25.3674856Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:25.3675527Z ^ 2025-05-07T20:00:25.3675932Z detected during: 2025-05-07T20:00:25.3698598Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:25.3742278Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:25.3787461Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:25.3812438Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_f.cu 2025-05-07T20:00:25.3814349Z 2025-05-07T20:00:25.3814702Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:25.3815233Z 2025-05-07T20:00:26.1427146Z [139/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_t.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_t.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_t.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_t.cu.o 2025-05-07T20:00:26.1450182Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:00:26.1453064Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:26.1455318Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:26.1456109Z ^ 2025-05-07T20:00:26.1456434Z 2025-05-07T20:00:26.1456834Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:26.1457442Z 2025-05-07T20:00:26.1458902Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:26.1461004Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:00:26.1461920Z ^ 2025-05-07T20:00:26.1462216Z 2025-05-07T20:00:26.1463670Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:26.1465761Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:26.1466488Z ^ 2025-05-07T20:00:26.1466892Z detected during: 2025-05-07T20:00:26.1493784Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:26.1545564Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:26.1597640Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:26.1627987Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_t.cu 2025-05-07T20:00:26.1630096Z 2025-05-07T20:00:26.1630539Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:26.1631179Z 2025-05-07T20:00:26.1632662Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:26.1634752Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:26.1635502Z ^ 2025-05-07T20:00:26.1635952Z detected during: 2025-05-07T20:00:26.1662873Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:26.1715135Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:26.1767527Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:26.1797050Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_t.cu 2025-05-07T20:00:26.1799114Z 2025-05-07T20:00:26.1799560Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:26.1800418Z 2025-05-07T20:00:26.1801824Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:26.1803883Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:26.1804918Z ^ 2025-05-07T20:00:26.1805374Z detected during: 2025-05-07T20:00:26.1832940Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:26.1885790Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:26.1939847Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:26.1972437Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_t.cu 2025-05-07T20:00:26.1974556Z 2025-05-07T20:00:26.1975013Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:26.1975689Z 2025-05-07T20:00:26.1977175Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:26.1979405Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:26.1980282Z ^ 2025-05-07T20:00:26.1980764Z detected during: 2025-05-07T20:00:26.2008853Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:26.2062180Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:26.2116579Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:26.2147082Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_t.cu 2025-05-07T20:00:26.2149293Z 2025-05-07T20:00:26.2149740Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:26.2150396Z 2025-05-07T20:00:26.2151873Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:26.2153988Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:26.2154798Z ^ 2025-05-07T20:00:26.2155254Z detected during: 2025-05-07T20:00:26.2182733Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:26.2234346Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:26.2286998Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:26.2316742Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_t.cu 2025-05-07T20:00:26.2318807Z 2025-05-07T20:00:26.2319264Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:26.2319873Z 2025-05-07T20:00:26.2321247Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:26.2323278Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:26.2324031Z ^ 2025-05-07T20:00:26.2324468Z detected during: 2025-05-07T20:00:26.2351845Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:26.2403032Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:26.2454791Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:26.2484627Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_t.cu 2025-05-07T20:00:26.2486832Z 2025-05-07T20:00:26.2487266Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:26.2487894Z 2025-05-07T20:00:27.3651358Z [140/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_t.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_t.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_t.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_t.cu.o 2025-05-07T20:00:27.3669757Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:00:27.3672090Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:27.3673735Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:27.3674387Z ^ 2025-05-07T20:00:27.3674642Z 2025-05-07T20:00:27.3675016Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:27.3675540Z 2025-05-07T20:00:27.3676692Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:27.3678575Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:00:27.3679196Z ^ 2025-05-07T20:00:27.3679459Z 2025-05-07T20:00:27.3680618Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:27.3682274Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:27.3682888Z ^ 2025-05-07T20:00:27.3685230Z detected during: 2025-05-07T20:00:27.3707363Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:27.3756578Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:27.3803134Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:27.3826263Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_t.cu 2025-05-07T20:00:27.3827939Z 2025-05-07T20:00:27.3828300Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:27.3828818Z 2025-05-07T20:00:27.3829983Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:27.3831628Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:27.3832267Z ^ 2025-05-07T20:00:27.3832628Z detected during: 2025-05-07T20:00:27.3856140Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:27.3906625Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:27.3958148Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:27.3987287Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_t.cu 2025-05-07T20:00:27.3989264Z 2025-05-07T20:00:27.3989674Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:27.3990314Z 2025-05-07T20:00:27.3991691Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:27.3993691Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:27.3994439Z ^ 2025-05-07T20:00:27.3994891Z detected during: 2025-05-07T20:00:27.4021792Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:27.4072804Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:27.4124605Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:27.4154242Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_t.cu 2025-05-07T20:00:27.4156347Z 2025-05-07T20:00:27.4156781Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:27.4157414Z 2025-05-07T20:00:27.4158967Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:27.4161014Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:27.4161824Z ^ 2025-05-07T20:00:27.4162268Z detected during: 2025-05-07T20:00:27.4189129Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:27.4241287Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:27.4292845Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:27.4322737Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_t.cu 2025-05-07T20:00:27.4324809Z 2025-05-07T20:00:27.4325368Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:27.4326007Z 2025-05-07T20:00:27.4327396Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:27.4329454Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:27.4330215Z ^ 2025-05-07T20:00:27.4330682Z detected during: 2025-05-07T20:00:27.4357527Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:27.4408833Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:27.4460424Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:27.4490007Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_t.cu 2025-05-07T20:00:27.4492102Z 2025-05-07T20:00:27.4492543Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:27.4493181Z 2025-05-07T20:00:27.4494649Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:27.4496661Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:27.4497475Z ^ 2025-05-07T20:00:27.4497931Z detected during: 2025-05-07T20:00:27.4525416Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:27.4576091Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:27.4627919Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:27.4657140Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_t.cu 2025-05-07T20:00:27.4659164Z 2025-05-07T20:00:27.4659715Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:27.4660351Z 2025-05-07T20:00:28.4791030Z [141/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_f.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_f.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_f.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_f.cu.o 2025-05-07T20:00:28.4814632Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:00:28.4817490Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:28.4819745Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:28.4820539Z ^ 2025-05-07T20:00:28.4820858Z 2025-05-07T20:00:28.4821331Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:28.4821995Z 2025-05-07T20:00:28.4823458Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:28.4825652Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:00:28.4826470Z ^ 2025-05-07T20:00:28.4826771Z 2025-05-07T20:00:28.4828197Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:28.4830287Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:28.4831054Z ^ 2025-05-07T20:00:28.4831703Z detected during: 2025-05-07T20:00:28.4858345Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:28.4909538Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:28.4962115Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:28.4991097Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_f.cu 2025-05-07T20:00:28.4993212Z 2025-05-07T20:00:28.4993663Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:28.4994340Z 2025-05-07T20:00:28.4995786Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:28.4997804Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:28.4998536Z ^ 2025-05-07T20:00:28.4998957Z detected during: 2025-05-07T20:00:28.5024567Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=4, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:00:28.5076433Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:28.5127781Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:28.5179070Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:28.5208236Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_f.cu 2025-05-07T20:00:28.5210435Z 2025-05-07T20:00:28.5211871Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:28.5214132Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:28.5214903Z ^ 2025-05-07T20:00:28.5215369Z detected during: 2025-05-07T20:00:28.5242297Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:28.5292489Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:28.5343814Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:28.5372943Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_f.cu 2025-05-07T20:00:28.5375034Z 2025-05-07T20:00:28.5375477Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:28.5376134Z 2025-05-07T20:00:28.5377586Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:28.5379750Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:28.5380485Z ^ 2025-05-07T20:00:28.5380866Z detected during: 2025-05-07T20:00:28.5406361Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=4, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:00:28.5458245Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:28.5508934Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:28.5561600Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:28.5590620Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_f.cu 2025-05-07T20:00:28.5592800Z 2025-05-07T20:00:28.5594274Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:28.5596360Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:28.5597182Z ^ 2025-05-07T20:00:28.5597623Z detected during: 2025-05-07T20:00:28.5624348Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:28.5674578Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:28.5725778Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:28.5754726Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_f.cu 2025-05-07T20:00:28.5756803Z 2025-05-07T20:00:28.5757222Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:28.5757874Z 2025-05-07T20:00:28.5759485Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:28.5761441Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:28.5762139Z ^ 2025-05-07T20:00:28.5762541Z detected during: 2025-05-07T20:00:28.5787894Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=4, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:00:28.5839439Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:28.5889355Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:28.5940928Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:28.5970006Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_f.cu 2025-05-07T20:00:28.5972190Z 2025-05-07T20:00:28.5974392Z ptxas /tmp/tmpxft_00007e82_00000000-9_f4f4bf16_128_256_2_1_1_f.compute_90.ptx, line 835; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:00:28.5979135Z ptxas /tmp/tmpxft_00007e82_00000000-9_f4f4bf16_128_256_2_1_1_f.compute_90.ptx, line 848; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:00:28.5983903Z ptxas /tmp/tmpxft_00007e82_00000000-9_f4f4bf16_128_256_2_1_1_f.compute_90.ptx, line 988; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:00:28.5988472Z ptxas /tmp/tmpxft_00007e82_00000000-9_f4f4bf16_128_256_2_1_1_f.compute_90.ptx, line 1001; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:00:28.5992375Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:28.5994480Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:28.5995255Z ^ 2025-05-07T20:00:28.5995728Z detected during: 2025-05-07T20:00:28.6022555Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:28.6072410Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:28.6124868Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:28.6154095Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_f.cu 2025-05-07T20:00:28.6156127Z 2025-05-07T20:00:28.6156548Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:28.6157318Z 2025-05-07T20:00:28.6158791Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:28.6160878Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:28.6161609Z ^ 2025-05-07T20:00:28.6161995Z detected during: 2025-05-07T20:00:28.6187476Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=4, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:00:28.6239261Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:28.6290310Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:28.6324318Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:28.6340611Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_f.cu 2025-05-07T20:00:28.6341791Z 2025-05-07T20:00:28.6342677Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:28.6343854Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:28.6344368Z ^ 2025-05-07T20:00:28.6344668Z detected during: 2025-05-07T20:00:28.6359458Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:28.6387409Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:28.6416992Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:28.6433246Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_f.cu 2025-05-07T20:00:28.6434481Z 2025-05-07T20:00:28.6434765Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:28.6435137Z 2025-05-07T20:00:28.6435958Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:28.6437128Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:28.6437611Z ^ 2025-05-07T20:00:28.6437854Z detected during: 2025-05-07T20:00:28.6451933Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=4, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:00:28.6480699Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:28.6508922Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:28.6537423Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:28.6553697Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_f.cu 2025-05-07T20:00:28.6554870Z 2025-05-07T20:00:28.6555715Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:28.6556913Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:28.6557399Z ^ 2025-05-07T20:00:28.6557685Z detected during: 2025-05-07T20:00:28.6572357Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:28.6600496Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:28.6628879Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:28.6645058Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_f.cu 2025-05-07T20:00:28.6646219Z 2025-05-07T20:00:28.6646477Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:28.6646866Z 2025-05-07T20:00:28.6647674Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:28.6648870Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:28.6649284Z ^ 2025-05-07T20:00:28.6649542Z detected during: 2025-05-07T20:00:28.6663683Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=4, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:00:28.6692160Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:28.6720348Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:28.6749505Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:28.6765644Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_f.cu 2025-05-07T20:00:28.6766829Z 2025-05-07T20:00:30.2196749Z [142/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise_grouped.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise_grouped.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise_grouped.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise_grouped.cu.o 2025-05-07T20:00:30.2209731Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:00:30.2222725Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:30.2224053Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:30.2224547Z ^ 2025-05-07T20:00:30.2224735Z 2025-05-07T20:00:30.2224994Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:30.2225405Z 2025-05-07T20:00:30.2226240Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:30.2227571Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:00:30.2228026Z ^ 2025-05-07T20:00:30.2228206Z 2025-05-07T20:00:30.8355389Z [143/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_t.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_t.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_t.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_t.cu.o 2025-05-07T20:00:30.8368174Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:00:30.8369779Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:30.8370976Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:30.8371446Z ^ 2025-05-07T20:00:30.8371653Z 2025-05-07T20:00:30.8371907Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:30.8372279Z 2025-05-07T20:00:30.8373130Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:30.8374311Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:00:30.8374789Z ^ 2025-05-07T20:00:30.8374963Z 2025-05-07T20:00:30.8375895Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:30.8377060Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:30.8377542Z ^ 2025-05-07T20:00:30.8377814Z detected during: 2025-05-07T20:00:30.8392850Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:30.8423057Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:30.8451760Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:30.8468113Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_t.cu 2025-05-07T20:00:30.8469277Z 2025-05-07T20:00:30.8469531Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:30.8469958Z 2025-05-07T20:00:30.8470767Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:30.8471920Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:30.8472336Z ^ 2025-05-07T20:00:30.8472598Z detected during: 2025-05-07T20:00:30.8486749Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:00:30.8515704Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:30.8543900Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:30.8572459Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:30.8591873Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_t.cu 2025-05-07T20:00:30.8593053Z 2025-05-07T20:00:30.8593864Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:30.8595058Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:30.8595522Z ^ 2025-05-07T20:00:30.8595829Z detected during: 2025-05-07T20:00:30.8610810Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:30.8639189Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:30.8667947Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:30.8684284Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_t.cu 2025-05-07T20:00:30.8685447Z 2025-05-07T20:00:30.8685708Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:30.8686105Z 2025-05-07T20:00:30.8686922Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:30.8688078Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:30.8688501Z ^ 2025-05-07T20:00:30.8688764Z detected during: 2025-05-07T20:00:30.8703281Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:00:30.8732038Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:30.8761075Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:30.8789758Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:30.8806185Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_t.cu 2025-05-07T20:00:30.8807384Z 2025-05-07T20:00:30.8808207Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:30.8809406Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:30.8809862Z ^ 2025-05-07T20:00:30.8810159Z detected during: 2025-05-07T20:00:30.8825173Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:30.8853283Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:30.8881976Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:30.8898160Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_t.cu 2025-05-07T20:00:30.8899343Z 2025-05-07T20:00:30.8899688Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:30.8900065Z 2025-05-07T20:00:30.8901092Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:30.8902227Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:30.8902681Z ^ 2025-05-07T20:00:30.8902951Z detected during: 2025-05-07T20:00:30.8917163Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:00:30.8946134Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:30.8974249Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:30.9003013Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:30.9019299Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_t.cu 2025-05-07T20:00:30.9020517Z 2025-05-07T20:00:30.9021332Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:30.9022558Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:30.9023016Z ^ 2025-05-07T20:00:30.9023313Z detected during: 2025-05-07T20:00:30.9038112Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:30.9067095Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:30.9095685Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:30.9112187Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_t.cu 2025-05-07T20:00:30.9113379Z 2025-05-07T20:00:30.9113633Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:30.9114003Z 2025-05-07T20:00:30.9114898Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:30.9116112Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:30.9116575Z ^ 2025-05-07T20:00:30.9116842Z detected during: 2025-05-07T20:00:30.9130989Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:00:30.9159803Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:30.9187960Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:30.9216559Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:30.9232940Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_t.cu 2025-05-07T20:00:30.9234136Z 2025-05-07T20:00:30.9234953Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:30.9236151Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:30.9236612Z ^ 2025-05-07T20:00:30.9236917Z detected during: 2025-05-07T20:00:30.9251735Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:30.9279887Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:30.9308815Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:30.9325169Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_t.cu 2025-05-07T20:00:30.9326381Z 2025-05-07T20:00:30.9326639Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:30.9327012Z 2025-05-07T20:00:30.9327856Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:30.9328994Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:30.9329438Z ^ 2025-05-07T20:00:30.9329711Z detected during: 2025-05-07T20:00:30.9343949Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:00:30.9372775Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:30.9402014Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:30.9430745Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:30.9447166Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_t.cu 2025-05-07T20:00:30.9448343Z 2025-05-07T20:00:30.9449158Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:30.9450339Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:30.9450795Z ^ 2025-05-07T20:00:30.9451095Z detected during: 2025-05-07T20:00:30.9465955Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:30.9494160Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:30.9523128Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:30.9539354Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_t.cu 2025-05-07T20:00:30.9540577Z 2025-05-07T20:00:30.9540836Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:30.9541204Z 2025-05-07T20:00:30.9542017Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:30.9543167Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:30.9543602Z ^ 2025-05-07T20:00:30.9543843Z detected during: 2025-05-07T20:00:30.9557896Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:00:30.9586598Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:30.9614873Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:30.9643602Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:30.9659881Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_t.cu 2025-05-07T20:00:30.9661041Z 2025-05-07T20:00:33.9393512Z [144/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_f.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_f.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_f.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_f.cu.o 2025-05-07T20:00:33.9406311Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:00:33.9407943Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:33.9409115Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:33.9409604Z ^ 2025-05-07T20:00:33.9409784Z 2025-05-07T20:00:33.9410066Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:33.9410437Z 2025-05-07T20:00:33.9411268Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:33.9412477Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:00:33.9412965Z ^ 2025-05-07T20:00:33.9413144Z 2025-05-07T20:00:33.9413953Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:33.9415125Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:33.9415578Z ^ 2025-05-07T20:00:33.9415874Z detected during: 2025-05-07T20:00:33.9431068Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:33.9459762Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:33.9488748Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:33.9505398Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_f.cu 2025-05-07T20:00:33.9506579Z 2025-05-07T20:00:33.9506824Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:33.9507185Z 2025-05-07T20:00:33.9507998Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:33.9509198Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:33.9509646Z ^ 2025-05-07T20:00:33.9509903Z detected during: 2025-05-07T20:00:33.9525009Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:33.9553618Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:33.9582715Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:33.9599436Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_f.cu 2025-05-07T20:00:33.9600802Z 2025-05-07T20:00:33.9601086Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:33.9601554Z 2025-05-07T20:00:33.9602366Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:33.9603555Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:33.9604014Z ^ 2025-05-07T20:00:33.9604315Z detected during: 2025-05-07T20:00:33.9619491Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:33.9648183Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:33.9677256Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:33.9693733Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_f.cu 2025-05-07T20:00:33.9694915Z 2025-05-07T20:00:33.9695170Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:33.9695540Z 2025-05-07T20:00:33.9696372Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:33.9697535Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:33.9698019Z ^ 2025-05-07T20:00:33.9698323Z detected during: 2025-05-07T20:00:33.9713854Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:33.9742550Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:33.9771685Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:33.9788284Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_f.cu 2025-05-07T20:00:33.9789450Z 2025-05-07T20:00:33.9789735Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:33.9790100Z 2025-05-07T20:00:33.9790907Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:33.9792090Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:33.9792618Z ^ 2025-05-07T20:00:33.9792896Z detected during: 2025-05-07T20:00:33.9808121Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:33.9836850Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:33.9866095Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:33.9882621Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_f.cu 2025-05-07T20:00:33.9883799Z 2025-05-07T20:00:33.9884054Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:33.9884427Z 2025-05-07T20:00:33.9885328Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:33.9886517Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:33.9886994Z ^ 2025-05-07T20:00:33.9887298Z detected during: 2025-05-07T20:00:33.9902534Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:33.9931086Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:33.9960315Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:33.9976830Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_f.cu 2025-05-07T20:00:33.9977987Z 2025-05-07T20:00:33.9978262Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:33.9978636Z 2025-05-07T20:00:35.3270154Z [145/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_t.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_t.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_t.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_t.cu.o 2025-05-07T20:00:35.3282861Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:00:35.3284476Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:35.3285739Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:35.3286199Z ^ 2025-05-07T20:00:35.3286402Z 2025-05-07T20:00:35.3286659Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:35.3287027Z 2025-05-07T20:00:35.3287877Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:35.3289059Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:00:35.3289530Z ^ 2025-05-07T20:00:35.3289706Z 2025-05-07T20:00:35.3290506Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:35.3291686Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:35.3292161Z ^ 2025-05-07T20:00:35.3292430Z detected during: 2025-05-07T20:00:35.3307664Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:35.3337034Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:35.3365595Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:35.3381883Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_t.cu 2025-05-07T20:00:35.3383046Z 2025-05-07T20:00:35.3383299Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:35.3383693Z 2025-05-07T20:00:35.3384499Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:35.3385654Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:35.3386078Z ^ 2025-05-07T20:00:35.3386337Z detected during: 2025-05-07T20:00:35.3400597Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=4, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:00:35.3429197Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:35.3457213Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:35.3485604Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:35.3501933Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_t.cu 2025-05-07T20:00:35.3503123Z 2025-05-07T20:00:35.3503945Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:35.3505130Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:35.3505591Z ^ 2025-05-07T20:00:35.3505894Z detected during: 2025-05-07T20:00:35.3520607Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:35.3548693Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:35.3577021Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:35.3593216Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_t.cu 2025-05-07T20:00:35.3594404Z 2025-05-07T20:00:35.3594663Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:35.3595029Z 2025-05-07T20:00:35.3595869Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:35.3597004Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:35.3597453Z ^ 2025-05-07T20:00:35.3597698Z detected during: 2025-05-07T20:00:35.3611922Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=4, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:00:35.3640344Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:35.3668827Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:35.3697137Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:35.3713617Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_t.cu 2025-05-07T20:00:35.3714793Z 2025-05-07T20:00:35.3715636Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:35.3716804Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:35.3717284Z ^ 2025-05-07T20:00:35.3717557Z detected during: 2025-05-07T20:00:35.3732292Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:35.3760412Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:35.3788919Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:35.3805271Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_t.cu 2025-05-07T20:00:35.3806431Z 2025-05-07T20:00:35.3806718Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:35.3807090Z 2025-05-07T20:00:35.3807902Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:35.3809051Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:35.3809468Z ^ 2025-05-07T20:00:35.3809726Z detected during: 2025-05-07T20:00:35.3823918Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=4, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:00:35.3852445Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:35.3880446Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:35.3909113Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:35.3925319Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_t.cu 2025-05-07T20:00:35.3926470Z 2025-05-07T20:00:35.3927715Z ptxas /tmp/tmpxft_00007e88_00000000-9_f4f4bf16_128_256_2_1_1_t.compute_90.ptx, line 835; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:00:35.3930314Z ptxas /tmp/tmpxft_00007e88_00000000-9_f4f4bf16_128_256_2_1_1_t.compute_90.ptx, line 848; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:00:35.3932958Z ptxas /tmp/tmpxft_00007e88_00000000-9_f4f4bf16_128_256_2_1_1_t.compute_90.ptx, line 988; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:00:35.3935572Z ptxas /tmp/tmpxft_00007e88_00000000-9_f4f4bf16_128_256_2_1_1_t.compute_90.ptx, line 1001; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:00:35.3937745Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:35.3938903Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:35.3939385Z ^ 2025-05-07T20:00:35.3939708Z detected during: 2025-05-07T20:00:35.3954393Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:35.3983106Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:35.4011542Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:35.4027869Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_t.cu 2025-05-07T20:00:35.4029034Z 2025-05-07T20:00:35.4029293Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:35.4029686Z 2025-05-07T20:00:35.4030502Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:35.4031663Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:35.4032089Z ^ 2025-05-07T20:00:35.4032348Z detected during: 2025-05-07T20:00:35.4046395Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=4, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:00:35.4074872Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:35.4102925Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:35.4131370Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:35.4147582Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_t.cu 2025-05-07T20:00:35.4148784Z 2025-05-07T20:00:35.4149606Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:35.4150802Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:35.4151270Z ^ 2025-05-07T20:00:35.4151575Z detected during: 2025-05-07T20:00:35.4166263Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:35.4194153Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:35.4222630Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:35.4238721Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_t.cu 2025-05-07T20:00:35.4239900Z 2025-05-07T20:00:35.4240158Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:35.4240529Z 2025-05-07T20:00:35.4241364Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:35.4242488Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:35.4242927Z ^ 2025-05-07T20:00:35.4243166Z detected during: 2025-05-07T20:00:35.4257218Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=4, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:00:35.4285786Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:35.4314708Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:35.4343091Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:35.4359210Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_t.cu 2025-05-07T20:00:35.4360391Z 2025-05-07T20:00:35.4361203Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:35.4362360Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:35.4362840Z ^ 2025-05-07T20:00:35.4363174Z detected during: 2025-05-07T20:00:35.4377801Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:35.4405813Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:35.4434281Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:35.4450336Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_t.cu 2025-05-07T20:00:35.4451486Z 2025-05-07T20:00:35.4451760Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:35.4452127Z 2025-05-07T20:00:35.4452944Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:35.4454095Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:35.4454529Z ^ 2025-05-07T20:00:35.4454767Z detected during: 2025-05-07T20:00:35.4468876Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=4, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:00:35.4497397Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:35.4525390Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:35.4553839Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:35.4569958Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_t.cu 2025-05-07T20:00:35.4571109Z 2025-05-07T20:00:55.0322775Z [146/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8i4bf16_rowwise.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8i4bf16_rowwise.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8i4bf16_rowwise.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8i4bf16_rowwise.cu.o 2025-05-07T20:00:55.0335274Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:00:55.0336931Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:55.0338130Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:55.0338617Z ^ 2025-05-07T20:00:55.0338794Z 2025-05-07T20:00:55.0339049Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:55.0339520Z 2025-05-07T20:00:55.0340359Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:55.0341567Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:00:55.0342016Z ^ 2025-05-07T20:00:55.0342190Z 2025-05-07T20:01:01.2527867Z [147/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_t.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_t.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_t.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_t.cu.o 2025-05-07T20:01:01.2541357Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:01:01.2543043Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:01.2544244Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:01.2544736Z ^ 2025-05-07T20:01:01.2544914Z 2025-05-07T20:01:01.2545172Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:01.2545572Z 2025-05-07T20:01:01.2546402Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:01.2547600Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:01:01.2548058Z ^ 2025-05-07T20:01:01.2548242Z 2025-05-07T20:01:01.2549065Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:01.2550225Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:01.2550701Z ^ 2025-05-07T20:01:01.2550989Z detected during: 2025-05-07T20:01:01.2566040Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:01.2596252Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:01.2626044Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:01.2642686Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_t.cu 2025-05-07T20:01:01.2643826Z 2025-05-07T20:01:01.2644099Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:01.2644465Z 2025-05-07T20:01:01.2645260Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:01.2646435Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:01.2646875Z ^ 2025-05-07T20:01:01.2647174Z detected during: 2025-05-07T20:01:01.2661230Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=13, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:01.2689882Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:01.2718582Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:01.2747349Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:01.2763776Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_t.cu 2025-05-07T20:01:01.2764901Z 2025-05-07T20:01:01.2765695Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:01.2766846Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:01.2767314Z ^ 2025-05-07T20:01:01.2767584Z detected during: 2025-05-07T20:01:01.2782481Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:01.2811316Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:01.2840742Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:01.2856804Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_t.cu 2025-05-07T20:01:01.2857939Z 2025-05-07T20:01:01.2858187Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:01.2858576Z 2025-05-07T20:01:01.2859369Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:01.2860734Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:01.2861158Z ^ 2025-05-07T20:01:01.2861424Z detected during: 2025-05-07T20:01:01.2875604Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=13, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:01.2904660Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:01.2934070Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:01.2963114Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:01.2979117Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_t.cu 2025-05-07T20:01:01.2980499Z 2025-05-07T20:01:01.2981325Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:01.2982513Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:01.2983015Z ^ 2025-05-07T20:01:01.2983296Z detected during: 2025-05-07T20:01:01.2998321Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:01.3027572Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:01.3056312Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:01.3072974Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_t.cu 2025-05-07T20:01:01.3074104Z 2025-05-07T20:01:01.3074358Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:01.3074791Z 2025-05-07T20:01:01.3075583Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:01.3076710Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:01.3077124Z ^ 2025-05-07T20:01:01.3077383Z detected during: 2025-05-07T20:01:01.3091208Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=13, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:01.3120701Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:01.3149533Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:01.3177921Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:01.3194717Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_t.cu 2025-05-07T20:01:01.3195896Z 2025-05-07T20:01:01.3196718Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:01.3197868Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:01.3198320Z ^ 2025-05-07T20:01:01.3198598Z detected during: 2025-05-07T20:01:01.3214028Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:01.3243011Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:01.3271746Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:01.3287909Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_t.cu 2025-05-07T20:01:01.3289051Z 2025-05-07T20:01:01.3289300Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:01.3289661Z 2025-05-07T20:01:01.3290478Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:01.3291580Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:01.3292004Z ^ 2025-05-07T20:01:01.3292235Z detected during: 2025-05-07T20:01:01.3306654Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=13, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:01.3335042Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:01.3363535Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:01.3392558Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:01.3409146Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_t.cu 2025-05-07T20:01:01.3410304Z 2025-05-07T20:01:01.3411145Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:01.3412302Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:01.3412872Z ^ 2025-05-07T20:01:01.3413146Z detected during: 2025-05-07T20:01:01.3428156Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:01.3456119Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:01.3484960Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:01.3501242Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_t.cu 2025-05-07T20:01:01.3502386Z 2025-05-07T20:01:01.3502646Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:01.3503010Z 2025-05-07T20:01:01.3503811Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:01.3504995Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:01.3505414Z ^ 2025-05-07T20:01:01.3505636Z detected during: 2025-05-07T20:01:01.3519722Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=13, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:01.3548237Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:01.3577342Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:01.3606877Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:01.3623513Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_t.cu 2025-05-07T20:01:01.3624665Z 2025-05-07T20:01:01.3625484Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:01.3626674Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:01.3627157Z ^ 2025-05-07T20:01:01.3627435Z detected during: 2025-05-07T20:01:01.3642347Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:01.3670592Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:01.3698987Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:01.3715864Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_t.cu 2025-05-07T20:01:01.3717020Z 2025-05-07T20:01:01.3717275Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:01.3717673Z 2025-05-07T20:01:01.3718486Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:01.3719637Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:01.3720057Z ^ 2025-05-07T20:01:01.3720326Z detected during: 2025-05-07T20:01:01.3734704Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=13, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:01.3763511Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:01.3792201Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:01.3842044Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:01.3858198Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_t.cu 2025-05-07T20:01:01.3859357Z 2025-05-07T20:01:11.2942853Z [148/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_t.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_t.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_t.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_t.cu.o 2025-05-07T20:01:11.2955714Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:01:11.2957338Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:11.2958508Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:11.2958994Z ^ 2025-05-07T20:01:11.2959174Z 2025-05-07T20:01:11.2959452Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:11.2959818Z 2025-05-07T20:01:11.2960649Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:11.2961849Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:01:11.2962304Z ^ 2025-05-07T20:01:11.2962501Z 2025-05-07T20:01:11.2963305Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:11.2964478Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:11.2964932Z ^ 2025-05-07T20:01:11.2965219Z detected during: 2025-05-07T20:01:11.2980387Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:11.3009914Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:11.3039040Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:11.3055124Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_t.cu 2025-05-07T20:01:11.3056273Z 2025-05-07T20:01:11.3056529Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:11.3056915Z 2025-05-07T20:01:11.3057709Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:11.3058859Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:11.3059267Z ^ 2025-05-07T20:01:11.3059567Z detected during: 2025-05-07T20:01:11.3073971Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=13, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple, cute::C<4>, cute::C<1>>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:11.3102736Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:11.3130985Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:11.3159888Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:11.3175898Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_t.cu 2025-05-07T20:01:11.3177080Z 2025-05-07T20:01:11.3177896Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:11.3179081Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:11.3179604Z ^ 2025-05-07T20:01:11.3180071Z detected during: 2025-05-07T20:01:11.3195194Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:11.3224077Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:11.3252646Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:11.3269257Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_t.cu 2025-05-07T20:01:11.3270439Z 2025-05-07T20:01:11.3270694Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:11.3271064Z 2025-05-07T20:01:11.3271896Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:11.3273116Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:11.3273542Z ^ 2025-05-07T20:01:11.3273777Z detected during: 2025-05-07T20:01:11.3287593Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=13, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple, cute::C<4>, cute::C<1>>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:11.3316579Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:11.3345368Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:11.3373969Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:11.3390564Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_t.cu 2025-05-07T20:01:11.3391715Z 2025-05-07T20:01:11.3392660Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:11.3393789Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:11.3394258Z ^ 2025-05-07T20:01:11.3394528Z detected during: 2025-05-07T20:01:11.3409787Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:11.3438560Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:11.3467255Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:11.3483579Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_t.cu 2025-05-07T20:01:11.3484709Z 2025-05-07T20:01:11.3484982Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:11.3485339Z 2025-05-07T20:01:11.3486131Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:11.3487249Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:11.3487664Z ^ 2025-05-07T20:01:11.3487954Z detected during: 2025-05-07T20:01:11.3502206Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=13, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple, cute::C<4>, cute::C<1>>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:11.3530913Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:11.3559461Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:11.3588185Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:11.3604949Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_t.cu 2025-05-07T20:01:11.3606104Z 2025-05-07T20:01:11.3606941Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:11.3608135Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:11.3608601Z ^ 2025-05-07T20:01:11.3608908Z detected during: 2025-05-07T20:01:11.3624249Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:11.3653036Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:11.3682406Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:11.3698490Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_t.cu 2025-05-07T20:01:11.3699705Z 2025-05-07T20:01:11.3700125Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:11.3700622Z 2025-05-07T20:01:11.3701456Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:11.3702586Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:11.3703023Z ^ 2025-05-07T20:01:11.3703262Z detected during: 2025-05-07T20:01:11.3717440Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=13, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple, cute::C<4>, cute::C<1>>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:11.3746246Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:11.3774434Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:11.3803908Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:11.3820575Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_t.cu 2025-05-07T20:01:11.3821733Z 2025-05-07T20:01:11.3822566Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:11.3823728Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:11.3824211Z ^ 2025-05-07T20:01:11.3824484Z detected during: 2025-05-07T20:01:11.3839479Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:11.3867738Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:11.3896306Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:11.3913138Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_t.cu 2025-05-07T20:01:11.3914267Z 2025-05-07T20:01:11.3914541Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:11.3914990Z 2025-05-07T20:01:11.3915782Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:11.3916898Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:11.3917309Z ^ 2025-05-07T20:01:11.3917570Z detected during: 2025-05-07T20:01:11.3931905Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=13, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple, cute::C<4>, cute::C<1>>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:11.3960806Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:11.3989551Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:11.4018809Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:11.4035572Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_t.cu 2025-05-07T20:01:11.4036696Z 2025-05-07T20:01:11.4037512Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:11.4038692Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:11.4039168Z ^ 2025-05-07T20:01:11.4039439Z detected during: 2025-05-07T20:01:11.4054045Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:11.4082587Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:11.4111539Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:11.4127870Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_t.cu 2025-05-07T20:01:11.4128999Z 2025-05-07T20:01:11.4129248Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:11.4129638Z 2025-05-07T20:01:11.4130431Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:11.4131548Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:11.4131960Z ^ 2025-05-07T20:01:11.4132223Z detected during: 2025-05-07T20:01:11.4149647Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=13, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple, cute::C<4>, cute::C<1>>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:11.4178283Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:11.4207426Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:11.4236644Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:11.4252822Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_t.cu 2025-05-07T20:01:11.4253947Z 2025-05-07T20:01:12.4883921Z [149/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/bf16i4bf16.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/bf16i4bf16.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/bf16i4bf16.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/bf16i4bf16.cu.o 2025-05-07T20:01:12.4895912Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:01:12.4897468Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:12.4898629Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:12.4899057Z ^ 2025-05-07T20:01:12.4899239Z 2025-05-07T20:01:12.4899549Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:12.4900061Z 2025-05-07T20:01:12.4901177Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:12.4902348Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:01:12.4902797Z ^ 2025-05-07T20:01:12.4902966Z 2025-05-07T20:01:13.8388703Z [150/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_f.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_f.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_f.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_f.cu.o 2025-05-07T20:01:13.8401703Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:01:13.8403297Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:13.8404558Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:13.8405046Z ^ 2025-05-07T20:01:13.8405291Z 2025-05-07T20:01:13.8405547Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:13.8405933Z 2025-05-07T20:01:13.8406761Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:13.8407954Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:01:13.8408407Z ^ 2025-05-07T20:01:13.8408584Z 2025-05-07T20:01:13.8409412Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:13.8410560Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:13.8411034Z ^ 2025-05-07T20:01:13.8411312Z detected during: 2025-05-07T20:01:13.8426538Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:13.8453865Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:13.8482071Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:13.8497386Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_f.cu 2025-05-07T20:01:13.8498470Z 2025-05-07T20:01:13.8498732Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:13.8499074Z 2025-05-07T20:01:13.8500033Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:13.8501345Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:13.8501774Z ^ 2025-05-07T20:01:13.8502044Z detected during: 2025-05-07T20:01:13.8516175Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=13, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:13.8543441Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:13.8570609Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:13.8598728Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:13.8615409Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_f.cu 2025-05-07T20:01:13.8616547Z 2025-05-07T20:01:13.8617298Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:13.8618395Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:13.8618819Z ^ 2025-05-07T20:01:13.8619104Z detected during: 2025-05-07T20:01:13.8634228Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:13.8661937Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:13.8689699Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:13.8705864Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_f.cu 2025-05-07T20:01:13.8707095Z 2025-05-07T20:01:13.8707350Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:13.8707721Z 2025-05-07T20:01:13.8708556Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:13.8709687Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:13.8710135Z ^ 2025-05-07T20:01:13.8710381Z detected during: 2025-05-07T20:01:13.8723837Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=13, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:13.8753554Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:13.8780155Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:13.8808966Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:13.8825281Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_f.cu 2025-05-07T20:01:13.8826427Z 2025-05-07T20:01:13.8827244Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:13.8828389Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:13.8828868Z ^ 2025-05-07T20:01:13.8829126Z detected during: 2025-05-07T20:01:13.8843325Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:13.8870837Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:13.8898251Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:13.8914961Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_f.cu 2025-05-07T20:01:13.8916118Z 2025-05-07T20:01:13.8916383Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:13.8916744Z 2025-05-07T20:01:13.8917547Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:13.8918669Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:13.8919129Z ^ 2025-05-07T20:01:13.8919469Z detected during: 2025-05-07T20:01:13.8932689Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=13, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:13.8960576Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:13.8987702Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:13.9016365Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:13.9032738Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_f.cu 2025-05-07T20:01:13.9033801Z 2025-05-07T20:01:13.9034539Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:13.9035637Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:13.9036062Z ^ 2025-05-07T20:01:13.9036325Z detected during: 2025-05-07T20:01:13.9050152Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:13.9078190Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:13.9105726Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:13.9121542Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_f.cu 2025-05-07T20:01:13.9122614Z 2025-05-07T20:01:13.9122844Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:13.9123195Z 2025-05-07T20:01:13.9123933Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:13.9124968Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:13.9125359Z ^ 2025-05-07T20:01:13.9125604Z detected during: 2025-05-07T20:01:13.9138679Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=13, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:13.9166651Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:13.9194421Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:13.9223382Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:13.9239370Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_f.cu 2025-05-07T20:01:13.9240472Z 2025-05-07T20:01:13.9241221Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:13.9242320Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:13.9242750Z ^ 2025-05-07T20:01:13.9243027Z detected during: 2025-05-07T20:01:13.9256866Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:13.9284588Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:13.9312890Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:13.9328374Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_f.cu 2025-05-07T20:01:13.9329471Z 2025-05-07T20:01:13.9329711Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:13.9330053Z 2025-05-07T20:01:13.9330828Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:13.9331868Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:13.9332274Z ^ 2025-05-07T20:01:13.9332501Z detected during: 2025-05-07T20:01:13.9346202Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=13, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:13.9373984Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:13.9402229Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:13.9431122Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:13.9446538Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_f.cu 2025-05-07T20:01:13.9447654Z 2025-05-07T20:01:13.9448429Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:13.9449506Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:13.9449960Z ^ 2025-05-07T20:01:13.9450222Z detected during: 2025-05-07T20:01:13.9464749Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:13.9491743Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:13.9519979Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:13.9535332Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_f.cu 2025-05-07T20:01:13.9536507Z 2025-05-07T20:01:13.9536769Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:13.9537113Z 2025-05-07T20:01:13.9537867Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:13.9538942Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:13.9539330Z ^ 2025-05-07T20:01:13.9539626Z detected during: 2025-05-07T20:01:13.9553885Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=13, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:13.9580913Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:13.9608956Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:13.9637320Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:13.9652532Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_f.cu 2025-05-07T20:01:13.9653605Z 2025-05-07T20:01:18.7291422Z [151/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_f.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_f.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_f.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_f.cu.o 2025-05-07T20:01:18.7304799Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:01:18.7306421Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:18.7307697Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:18.7308163Z ^ 2025-05-07T20:01:18.7308352Z 2025-05-07T20:01:18.7308633Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:18.7309008Z 2025-05-07T20:01:18.7309844Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:18.7311045Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:01:18.7311521Z ^ 2025-05-07T20:01:18.7311702Z 2025-05-07T20:01:18.7312696Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:18.7313787Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:18.7314212Z ^ 2025-05-07T20:01:18.7314499Z detected during: 2025-05-07T20:01:18.7328540Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:18.7355882Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:18.7383285Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:18.7399417Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_f.cu 2025-05-07T20:01:18.7400806Z 2025-05-07T20:01:18.7401063Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:18.7401432Z 2025-05-07T20:01:18.7402268Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:18.7403391Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:18.7403831Z ^ 2025-05-07T20:01:18.7404100Z detected during: 2025-05-07T20:01:18.7417911Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=13, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple, cute::C<4>, cute::C<1>>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:18.7446940Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:18.7474588Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:18.7502034Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:18.7518272Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_f.cu 2025-05-07T20:01:18.7519374Z 2025-05-07T20:01:18.7520125Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:18.7521217Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:18.7521642Z ^ 2025-05-07T20:01:18.7521943Z detected during: 2025-05-07T20:01:18.7535847Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:18.7563427Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:18.7591262Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:18.7607719Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_f.cu 2025-05-07T20:01:18.7608900Z 2025-05-07T20:01:18.7609153Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:18.7609525Z 2025-05-07T20:01:18.7610335Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:18.7611491Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:18.7611926Z ^ 2025-05-07T20:01:18.7612170Z detected during: 2025-05-07T20:01:18.7625989Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=13, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple, cute::C<4>, cute::C<1>>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:18.7653263Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:18.7680837Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:18.7708725Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:18.7724499Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_f.cu 2025-05-07T20:01:18.7725569Z 2025-05-07T20:01:18.7726337Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:18.7727407Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:18.7727853Z ^ 2025-05-07T20:01:18.7728109Z detected during: 2025-05-07T20:01:18.7742368Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:18.7769672Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:18.7797752Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:18.7814263Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_f.cu 2025-05-07T20:01:18.7815334Z 2025-05-07T20:01:18.7815625Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:18.7815970Z 2025-05-07T20:01:18.7816722Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:18.7817788Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:18.7818174Z ^ 2025-05-07T20:01:18.7818415Z detected during: 2025-05-07T20:01:18.7832695Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=13, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple, cute::C<4>, cute::C<1>>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:18.7859480Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:18.7887081Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:18.7915397Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:18.7930874Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_f.cu 2025-05-07T20:01:18.7931944Z 2025-05-07T20:01:18.7932713Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:18.7933804Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:18.7934237Z ^ 2025-05-07T20:01:18.7934509Z detected during: 2025-05-07T20:01:18.7949334Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:18.7976061Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:18.8004686Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:18.8021045Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_f.cu 2025-05-07T20:01:18.8022232Z 2025-05-07T20:01:18.8022487Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:18.8022908Z 2025-05-07T20:01:18.8023715Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:18.8024846Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:18.8025280Z ^ 2025-05-07T20:01:18.8025548Z detected during: 2025-05-07T20:01:18.8039242Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=13, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple, cute::C<4>, cute::C<1>>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:18.8066600Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:18.8093527Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:18.8121756Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:18.8137059Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_f.cu 2025-05-07T20:01:18.8138154Z 2025-05-07T20:01:18.8138905Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:18.8140221Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:18.8140679Z ^ 2025-05-07T20:01:18.8141008Z detected during: 2025-05-07T20:01:18.8155829Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:18.8182566Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:18.8211062Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:18.8227430Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_f.cu 2025-05-07T20:01:18.8228621Z 2025-05-07T20:01:18.8228878Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:18.8229245Z 2025-05-07T20:01:18.8230078Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:18.8231201Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:18.8231639Z ^ 2025-05-07T20:01:18.8232299Z detected during: 2025-05-07T20:01:18.8245295Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=13, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple, cute::C<4>, cute::C<1>>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:18.8273251Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:18.8299602Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:18.8327868Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:18.8343614Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_f.cu 2025-05-07T20:01:18.8344778Z 2025-05-07T20:01:18.8345609Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:18.8346797Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:18.8347274Z ^ 2025-05-07T20:01:18.8347553Z detected during: 2025-05-07T20:01:18.8361982Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:18.8389303Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:18.8417560Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:18.8434031Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_f.cu 2025-05-07T20:01:18.8435112Z 2025-05-07T20:01:18.8435373Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:18.8435719Z 2025-05-07T20:01:18.8436469Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:18.8437547Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:18.8437939Z ^ 2025-05-07T20:01:18.8438186Z detected during: 2025-05-07T20:01:18.8451132Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=13, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple, cute::C<4>, cute::C<1>>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:18.8479130Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:18.8506405Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:18.8533757Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:18.8550238Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_f.cu 2025-05-07T20:01:18.8551405Z 2025-05-07T20:01:57.2201585Z [152/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8i4bf16_shuffled_grouped.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8i4bf16_shuffled_grouped.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8i4bf16_shuffled_grouped.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8i4bf16_shuffled_grouped.cu.o 2025-05-07T20:01:57.2214602Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:01:57.2216306Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:57.2217557Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:57.2217987Z ^ 2025-05-07T20:01:57.2218175Z 2025-05-07T20:01:57.2218489Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:57.2218835Z 2025-05-07T20:01:57.2219746Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:57.2221096Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:01:57.2221582Z ^ 2025-05-07T20:01:57.2221762Z 2025-05-07T20:01:59.4980504Z [153/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8i4bf16_shuffled.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8i4bf16_shuffled.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8i4bf16_shuffled.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8i4bf16_shuffled.cu.o 2025-05-07T20:01:59.4992874Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:01:59.4994388Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:59.4995504Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:59.4995947Z ^ 2025-05-07T20:01:59.4996149Z 2025-05-07T20:01:59.4996394Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:59.4996741Z 2025-05-07T20:01:59.4998722Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:59.4999889Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:01:59.5000690Z ^ 2025-05-07T20:01:59.5000933Z 2025-05-07T20:02:00.1166960Z [154/156] : && /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -fPIC -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -Wl,-O2 -Wl,--sort-common -Wl,--as-needed -Wl,-z,relro -Wl,-z,now -Wl,--disable-new-dtags -Wl,--gc-sections -Wl,--allow-shlib-undefined -Wl,-rpath,/github/home/miniconda/envs/build_binary/lib -Wl,-rpath-link,/github/home/miniconda/envs/build_binary/lib -L/github/home/miniconda/envs/build_binary/lib -L/github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib -L/github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/stubs -s -shared -Wl,-soname,fbgemm_gpu_experimental_gen_ai.so -o experimental/gen_ai/fbgemm_gpu_experimental_gen_ai.so experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/attention/attention.cpp.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/coalesce/coalesce.cpp.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/quantize.cpp.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/comm/car.cpp.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/gather_scatter/gather_scatter.cpp.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/moe/index_shuffling.cpp.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/kv_cache/kv_cache.cpp.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/attention/gqa_attn_splitk.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/coalesce/coalesce.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/quantize.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/comm/car.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/gather_scatter/gather_scatter.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/moe/index_shuffling.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/kv_cache/kv_cache.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/bf16bf16bf16_grouped.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/bf16i4bf16.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/bf16i4bf16_rowwise_batched.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/bf16i4bf16_shuffled_grouped.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_f.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_t.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_f.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_t.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_f.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_t.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_f.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_t.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_f.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_t.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_f.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_t.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_f.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_t.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_f.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_t.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_f.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_t.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_f.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_t.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_f.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_t.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_f.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_t.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_blockwise.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_cublas.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_lite.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_128_128_128_2_1_1_t_f.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_128_256_128_2_1_1_f_t.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_128_256_128_4_4_1_f_t.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_128_128_1_1_1_f_f.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_16_128_1_1_1_f_f.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_256_128_1_1_1_f_f.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_256_128_2_1_1_f_f.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_32_128_2_1_1_f_f.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_64_128_2_1_1_f_f.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/dispatch_fp8_rowwise_batched_kernel_on_cluster_size_and_transpose.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/dispatch_fp8_rowwise_batched_kernel_on_tile_size.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/f8f8bf16_rowwise_batched.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/f8f8bf16_rowwise_batched_impl.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/handle_transposition.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise_grouped.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_tensorwise.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8i4bf16_rowwise.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8i4bf16_shuffled.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8i4bf16_shuffled_grouped.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/i8i8bf16.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/i8i8bf16_dynamic.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/mixed_dtype_utils.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/fast_gemv/bf16_fast_gemv.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/fast_gemv/bf16fp8bf16_fast_gemv.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/fast_gemv/fp8fp8bf16_fast_gemv.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/fast_gemv/include/fast_gemv.cu.o -L/lib/intel64 -L/lib/intel64_win -L/lib/win-x64 -Wl,-rpath,/lib/intel64:/lib/intel64_win:/lib/win-x64:/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/lib:/github/home/miniconda/envs/build_binary/lib/stubs: /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/lib/libtorch.so /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/lib/libc10.so /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/libnvrtc.so /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/lib/libc10_cuda.so /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/nvidia/nccl/lib/libnccl.so.2 /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/stubs/libcuda.so /github/home/miniconda/envs/build_binary/lib/stubs/libnvidia-ml.so -Wl,--no-as-needed,"/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/lib/libtorch_cpu.so" -Wl,--as-needed -Wl,--no-as-needed,"/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/lib/libtorch_cuda.so" -Wl,--as-needed /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/lib/libc10_cuda.so /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/lib/libc10.so /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/libcudart.so -Wl,--no-as-needed,"/github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/torch/lib/libtorch.so" -Wl,--as-needed -lcudadevrt -lcudart_static -lrt -lpthread -ldl && : 2025-05-07T20:02:00.3728310Z [155/156] cd /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.13/cmake-build/experimental/gen_ai && bash /__w/FBGEMM/FBGEMM/fbgemm_gpu/../.github/scripts/fbgemm_gpu_postbuild.bash /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.13/cmake-build/experimental/gen_ai/fbgemm_gpu_experimental_gen_ai.so 2025-05-07T20:02:00.3730054Z ################################################################################ 2025-05-07T20:02:00.3730473Z [CMAKE] Running post-build script ... 2025-05-07T20:02:00.3731212Z Target file: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.13/cmake-build/experimental/gen_ai/fbgemm_gpu_experimental_gen_ai.so 2025-05-07T20:02:00.3731971Z Removing all RPATHs ... 2025-05-07T20:02:00.3732301Z ################################################################################ 2025-05-07T20:02:00.3733360Z [155/156] cd /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.13/cmake-build && /github/home/miniconda/envs/build_binary/lib/python3.13/site-packages/cmake/data/bin/cmake -P cmake_install.cmake 2025-05-07T20:02:00.4622631Z -- Install configuration: "Release" 2025-05-07T20:02:00.4655555Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/asmjit.so 2025-05-07T20:02:00.4696784Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/fbgemm.so 2025-05-07T20:02:00.4735812Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/experimental/gen_ai/fbgemm_gpu_experimental_gen_ai.so 2025-05-07T20:02:00.4750420Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/experimental/bench 2025-05-07T20:02:00.4773040Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/experimental/bench/__init__.py 2025-05-07T20:02:00.4776679Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/experimental/bench/ck_bf16_bench.py 2025-05-07T20:02:00.4777808Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/experimental/bench/comm_bench.py 2025-05-07T20:02:00.4782692Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/experimental/bench/gather_scatter_bench.py 2025-05-07T20:02:00.4783849Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/experimental/bench/quantize_bench.py 2025-05-07T20:02:00.4784928Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/experimental/bench/quantize_ops.py 2025-05-07T20:02:00.4789049Z -- Up-to-date: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/experimental/gen_ai 2025-05-07T20:02:00.4794394Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/experimental/gen_ai/moe 2025-05-07T20:02:00.4820513Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/experimental/gen_ai/moe/README.md 2025-05-07T20:02:00.4824036Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/experimental/gen_ai/moe/__init__.py 2025-05-07T20:02:00.4826294Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/experimental/gen_ai/moe/activation.py 2025-05-07T20:02:00.4827399Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/experimental/gen_ai/moe/gather_scatter.py 2025-05-07T20:02:00.4828514Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/experimental/gen_ai/moe/layers.py 2025-05-07T20:02:00.4829602Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/experimental/gen_ai/moe/shuffling.py 2025-05-07T20:02:00.4830659Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/experimental/gen_ai/__init__.py 2025-05-07T20:02:00.4833427Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/experimental/gen_ai/quantize.py 2025-05-07T20:02:00.4861435Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/experimental/example/fbgemm_gpu_experimental_example_py.so 2025-05-07T20:02:00.4888874Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/experimental/example/__init__.py 2025-05-07T20:02:00.4889987Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/experimental/example/utils.py 2025-05-07T20:02:00.4935571Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/experimental/gemm/triton_gemm/__init__.py 2025-05-07T20:02:00.4939620Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/experimental/gemm/triton_gemm/fp8_gemm.py 2025-05-07T20:02:00.4940898Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/experimental/gemm/triton_gemm/grouped_gemm.py 2025-05-07T20:02:00.4942132Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/experimental/gemm/triton_gemm/matmul_perf_model.py 2025-05-07T20:02:00.4943284Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/experimental/gemm/triton_gemm/utils.py 2025-05-07T20:02:00.5219810Z 2025-05-07T20:02:00.8556323Z 2025-05-07T20:02:00.8576886Z copying fbgemm_gpu/__init__.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/__init__.py 2025-05-07T20:02:00.8718751Z copying fbgemm_gpu/batched_unary_embeddings_ops.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/batched_unary_embeddings_ops.py 2025-05-07T20:02:00.8719710Z copying fbgemm_gpu/enums.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/enums.py 2025-05-07T20:02:00.8735579Z copying fbgemm_gpu/metrics.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/metrics.py 2025-05-07T20:02:00.8741235Z copying fbgemm_gpu/permute_pooled_embedding_modules.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/permute_pooled_embedding_modules.py 2025-05-07T20:02:00.8745005Z copying fbgemm_gpu/permute_pooled_embedding_modules_split.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/permute_pooled_embedding_modules_split.py 2025-05-07T20:02:00.8748669Z copying fbgemm_gpu/quantize_comm.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/quantize_comm.py 2025-05-07T20:02:00.8760455Z copying fbgemm_gpu/quantize_utils.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/quantize_utils.py 2025-05-07T20:02:00.8763175Z copying fbgemm_gpu/runtime_monitor.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/runtime_monitor.py 2025-05-07T20:02:00.8770973Z copying fbgemm_gpu/sparse_ops.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/sparse_ops.py 2025-05-07T20:02:00.8783687Z copying fbgemm_gpu/split_embedding_configs.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/split_embedding_configs.py 2025-05-07T20:02:00.8786986Z copying fbgemm_gpu/split_embedding_inference_converter.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/split_embedding_inference_converter.py 2025-05-07T20:02:00.8797397Z copying fbgemm_gpu/split_embedding_optimizer_ops.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/split_embedding_optimizer_ops.py 2025-05-07T20:02:00.8798524Z copying fbgemm_gpu/split_embedding_utils.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/split_embedding_utils.py 2025-05-07T20:02:00.8801342Z copying fbgemm_gpu/split_table_batched_embeddings_ops.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/split_table_batched_embeddings_ops.py 2025-05-07T20:02:00.8809930Z copying fbgemm_gpu/split_table_batched_embeddings_ops_common.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/split_table_batched_embeddings_ops_common.py 2025-05-07T20:02:00.8811573Z copying fbgemm_gpu/split_table_batched_embeddings_ops_inference.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/split_table_batched_embeddings_ops_inference.py 2025-05-07T20:02:00.8821979Z copying fbgemm_gpu/split_table_batched_embeddings_ops_training.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/split_table_batched_embeddings_ops_training.py 2025-05-07T20:02:00.8840455Z copying fbgemm_gpu/split_table_batched_embeddings_ops_training_common.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/split_table_batched_embeddings_ops_training_common.py 2025-05-07T20:02:00.8848002Z copying fbgemm_gpu/ssd_split_table_batched_embeddings_ops.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/ssd_split_table_batched_embeddings_ops.py 2025-05-07T20:02:00.8854036Z copying fbgemm_gpu/tbe_input_multiplexer.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe_input_multiplexer.py 2025-05-07T20:02:00.8857527Z copying fbgemm_gpu/uvm.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/uvm.py 2025-05-07T20:02:00.8866609Z creating directory _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/config 2025-05-07T20:02:00.8897886Z copying fbgemm_gpu/config/__init__.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/config/__init__.py 2025-05-07T20:02:00.8906672Z copying fbgemm_gpu/config/feature_list.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/config/feature_list.py 2025-05-07T20:02:00.8910943Z creating directory _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/docs 2025-05-07T20:02:00.8931662Z copying fbgemm_gpu/docs/__init__.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/docs/__init__.py 2025-05-07T20:02:00.8947598Z copying fbgemm_gpu/docs/common.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/docs/common.py 2025-05-07T20:02:00.8948638Z copying fbgemm_gpu/docs/examples.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/docs/examples.py 2025-05-07T20:02:00.8949586Z copying fbgemm_gpu/docs/jagged_tensor_ops.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/docs/jagged_tensor_ops.py 2025-05-07T20:02:00.8950680Z copying fbgemm_gpu/docs/merge_pooled_embedding_ops.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/docs/merge_pooled_embedding_ops.py 2025-05-07T20:02:00.8951918Z copying fbgemm_gpu/docs/permute_pooled_embedding_ops.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/docs/permute_pooled_embedding_ops.py 2025-05-07T20:02:00.8957628Z copying fbgemm_gpu/docs/quantize_ops.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/docs/quantize_ops.py 2025-05-07T20:02:00.8962728Z copying fbgemm_gpu/docs/sparse_ops.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/docs/sparse_ops.py 2025-05-07T20:02:00.8990218Z copying fbgemm_gpu/docs/version.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/docs/version.py 2025-05-07T20:02:00.8993818Z creating directory _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/quantize 2025-05-07T20:02:00.9011553Z copying fbgemm_gpu/quantize/__init__.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/quantize/__init__.py 2025-05-07T20:02:00.9022322Z copying fbgemm_gpu/quantize/quantize_ops.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/quantize/quantize_ops.py 2025-05-07T20:02:00.9025739Z creating directory _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/sll 2025-05-07T20:02:00.9026510Z copying fbgemm_gpu/sll/__init__.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/sll/__init__.py 2025-05-07T20:02:00.9033039Z creating directory _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe 2025-05-07T20:02:00.9049302Z copying fbgemm_gpu/tbe/__init__.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/__init__.py 2025-05-07T20:02:00.9050133Z creating directory _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/triton 2025-05-07T20:02:00.9062161Z copying fbgemm_gpu/triton/__init__.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/triton/__init__.py 2025-05-07T20:02:00.9066391Z copying fbgemm_gpu/triton/common.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/triton/common.py 2025-05-07T20:02:00.9070250Z copying fbgemm_gpu/triton/quantize.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/triton/quantize.py 2025-05-07T20:02:00.9078786Z copying fbgemm_gpu/triton/quantize_ref.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/triton/quantize_ref.py 2025-05-07T20:02:00.9083542Z creating directory _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/utils 2025-05-07T20:02:00.9101299Z copying fbgemm_gpu/utils/__init__.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/utils/__init__.py 2025-05-07T20:02:00.9104186Z copying fbgemm_gpu/utils/filestore.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/utils/filestore.py 2025-05-07T20:02:00.9108990Z copying fbgemm_gpu/utils/loader.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/utils/loader.py 2025-05-07T20:02:00.9113638Z copying fbgemm_gpu/utils/torch_library.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/utils/torch_library.py 2025-05-07T20:02:00.9123881Z creating directory _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/sll/cpu 2025-05-07T20:02:00.9126127Z copying fbgemm_gpu/sll/cpu/__init__.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/sll/cpu/__init__.py 2025-05-07T20:02:00.9129024Z copying fbgemm_gpu/sll/cpu/cpu_sll.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/sll/cpu/cpu_sll.py 2025-05-07T20:02:00.9136532Z creating directory _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/sll/meta 2025-05-07T20:02:00.9138797Z copying fbgemm_gpu/sll/meta/__init__.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/sll/meta/__init__.py 2025-05-07T20:02:00.9142508Z copying fbgemm_gpu/sll/meta/meta_sll.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/sll/meta/meta_sll.py 2025-05-07T20:02:00.9147550Z creating directory _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/sll/triton 2025-05-07T20:02:00.9172060Z copying fbgemm_gpu/sll/triton/__init__.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/sll/triton/__init__.py 2025-05-07T20:02:00.9177511Z copying fbgemm_gpu/sll/triton/common.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/sll/triton/common.py 2025-05-07T20:02:00.9182275Z copying fbgemm_gpu/sll/triton/triton_dense_jagged_cat_jagged_out.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/sll/triton/triton_dense_jagged_cat_jagged_out.py 2025-05-07T20:02:00.9186286Z copying fbgemm_gpu/sll/triton/triton_jagged2_to_padded_dense.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/sll/triton/triton_jagged2_to_padded_dense.py 2025-05-07T20:02:00.9190513Z copying fbgemm_gpu/sll/triton/triton_jagged_bmm.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/sll/triton/triton_jagged_bmm.py 2025-05-07T20:02:00.9196109Z copying fbgemm_gpu/sll/triton/triton_jagged_bmm_jagged_out.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/sll/triton/triton_jagged_bmm_jagged_out.py 2025-05-07T20:02:00.9223793Z copying fbgemm_gpu/sll/triton/triton_jagged_dense_elementwise_add.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/sll/triton/triton_jagged_dense_elementwise_add.py 2025-05-07T20:02:00.9227647Z copying fbgemm_gpu/sll/triton/triton_jagged_dense_elementwise_mul_jagged_out.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/sll/triton/triton_jagged_dense_elementwise_mul_jagged_out.py 2025-05-07T20:02:00.9229227Z copying fbgemm_gpu/sll/triton/triton_jagged_dense_flash_attention.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/sll/triton/triton_jagged_dense_flash_attention.py 2025-05-07T20:02:00.9234489Z copying fbgemm_gpu/sll/triton/triton_jagged_flash_attention_basic.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/sll/triton/triton_jagged_flash_attention_basic.py 2025-05-07T20:02:00.9239967Z copying fbgemm_gpu/sll/triton/triton_jagged_self_substraction_jagged_out.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/sll/triton/triton_jagged_self_substraction_jagged_out.py 2025-05-07T20:02:00.9243267Z copying fbgemm_gpu/sll/triton/triton_jagged_softmax.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/sll/triton/triton_jagged_softmax.py 2025-05-07T20:02:00.9251046Z copying fbgemm_gpu/sll/triton/triton_multi_head_jagged_flash_attention.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/sll/triton/triton_multi_head_jagged_flash_attention.py 2025-05-07T20:02:00.9254986Z creating directory _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/bench 2025-05-07T20:02:00.9255803Z copying fbgemm_gpu/tbe/bench/__init__.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/bench/__init__.py 2025-05-07T20:02:00.9260306Z copying fbgemm_gpu/tbe/bench/bench_config.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/bench/bench_config.py 2025-05-07T20:02:00.9267263Z copying fbgemm_gpu/tbe/bench/bench_runs.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/bench/bench_runs.py 2025-05-07T20:02:00.9283389Z copying fbgemm_gpu/tbe/bench/eeg_cli.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/bench/eeg_cli.py 2025-05-07T20:02:00.9285225Z copying fbgemm_gpu/tbe/bench/embedding_ops_common_config.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/bench/embedding_ops_common_config.py 2025-05-07T20:02:00.9292840Z copying fbgemm_gpu/tbe/bench/eval_compression.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/bench/eval_compression.py 2025-05-07T20:02:00.9296341Z copying fbgemm_gpu/tbe/bench/reporter.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/bench/reporter.py 2025-05-07T20:02:00.9300980Z copying fbgemm_gpu/tbe/bench/tbe_data_config.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/bench/tbe_data_config.py 2025-05-07T20:02:00.9306459Z copying fbgemm_gpu/tbe/bench/tbe_data_config_loader.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/bench/tbe_data_config_loader.py 2025-05-07T20:02:00.9310929Z copying fbgemm_gpu/tbe/bench/tbe_data_config_param_models.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/bench/tbe_data_config_param_models.py 2025-05-07T20:02:00.9317563Z copying fbgemm_gpu/tbe/bench/utils.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/bench/utils.py 2025-05-07T20:02:00.9322431Z creating directory _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/cache 2025-05-07T20:02:00.9324770Z copying fbgemm_gpu/tbe/cache/__init__.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/cache/__init__.py 2025-05-07T20:02:00.9327819Z copying fbgemm_gpu/tbe/cache/split_embeddings_cache_ops.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/cache/split_embeddings_cache_ops.py 2025-05-07T20:02:00.9331763Z creating directory _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/ssd 2025-05-07T20:02:00.9336678Z copying fbgemm_gpu/tbe/ssd/__init__.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/ssd/__init__.py 2025-05-07T20:02:00.9341320Z copying fbgemm_gpu/tbe/ssd/common.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/ssd/common.py 2025-05-07T20:02:00.9345125Z copying fbgemm_gpu/tbe/ssd/inference.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/ssd/inference.py 2025-05-07T20:02:00.9351113Z copying fbgemm_gpu/tbe/ssd/training.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/ssd/training.py 2025-05-07T20:02:00.9362402Z creating directory _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/utils 2025-05-07T20:02:00.9364020Z copying fbgemm_gpu/tbe/utils/__init__.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/utils/__init__.py 2025-05-07T20:02:00.9373320Z copying fbgemm_gpu/tbe/utils/common.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/utils/common.py 2025-05-07T20:02:00.9377096Z copying fbgemm_gpu/tbe/utils/offsets.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/utils/offsets.py 2025-05-07T20:02:00.9381065Z copying fbgemm_gpu/tbe/utils/quantize.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/utils/quantize.py 2025-05-07T20:02:00.9388533Z copying fbgemm_gpu/tbe/utils/requests.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/utils/requests.py 2025-05-07T20:02:00.9394583Z creating directory _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/stats 2025-05-07T20:02:00.9395415Z copying fbgemm_gpu/tbe/stats/__init__.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/stats/__init__.py 2025-05-07T20:02:00.9399569Z copying fbgemm_gpu/tbe/stats/bench_params_reporter.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/stats/bench_params_reporter.py 2025-05-07T20:02:00.9406431Z creating directory _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/ssd/utils 2025-05-07T20:02:00.9408887Z copying fbgemm_gpu/tbe/ssd/utils/__init__.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/ssd/utils/__init__.py 2025-05-07T20:02:00.9411397Z copying fbgemm_gpu/tbe/ssd/utils/partially_materialized_tensor.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/ssd/utils/partially_materialized_tensor.py 2025-05-07T20:02:00.9416123Z creating directory _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/triton/jagged 2025-05-07T20:02:00.9417023Z copying fbgemm_gpu/triton/jagged/__init__.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/triton/jagged/__init__.py 2025-05-07T20:02:00.9421126Z copying fbgemm_gpu/triton/jagged/triton_jagged_tensor_ops.py -> _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/triton/jagged/triton_jagged_tensor_ops.py 2025-05-07T20:02:00.9541359Z 2025-05-07T20:02:01.6075364Z INFO:root:running bdist_wheel 2025-05-07T20:02:01.7487172Z INFO:root:running build 2025-05-07T20:02:01.7497810Z INFO:root:running build_py 2025-05-07T20:02:01.7798661Z INFO:root:creating _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu 2025-05-07T20:02:01.7847972Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/__init__.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu 2025-05-07T20:02:01.7849442Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/batched_unary_embeddings_ops.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu 2025-05-07T20:02:01.7850874Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/enums.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu 2025-05-07T20:02:01.7852174Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/metrics.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu 2025-05-07T20:02:01.7853621Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/permute_pooled_embedding_modules.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu 2025-05-07T20:02:01.7855192Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/permute_pooled_embedding_modules_split.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu 2025-05-07T20:02:01.7856643Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/quantize_comm.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu 2025-05-07T20:02:01.7874869Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/quantize_utils.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu 2025-05-07T20:02:01.7878030Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/runtime_monitor.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu 2025-05-07T20:02:01.7879416Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/sparse_ops.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu 2025-05-07T20:02:01.7892725Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/split_embedding_configs.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu 2025-05-07T20:02:01.7894530Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/split_embedding_inference_converter.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu 2025-05-07T20:02:01.7896080Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/split_embedding_optimizer_ops.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu 2025-05-07T20:02:01.7897599Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/split_embedding_utils.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu 2025-05-07T20:02:01.7899092Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/split_table_batched_embeddings_ops.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu 2025-05-07T20:02:01.7904436Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/split_table_batched_embeddings_ops_common.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu 2025-05-07T20:02:01.7906025Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/split_table_batched_embeddings_ops_inference.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu 2025-05-07T20:02:01.7907803Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/split_table_batched_embeddings_ops_training.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu 2025-05-07T20:02:01.7927179Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/split_table_batched_embeddings_ops_training_common.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu 2025-05-07T20:02:01.7928839Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/ssd_split_table_batched_embeddings_ops.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu 2025-05-07T20:02:01.7930544Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe_input_multiplexer.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu 2025-05-07T20:02:01.7931913Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/uvm.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu 2025-05-07T20:02:01.7937386Z INFO:root:creating _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/config 2025-05-07T20:02:01.7938533Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/config/__init__.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/config 2025-05-07T20:02:01.7940148Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/config/feature_list.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/config 2025-05-07T20:02:01.7941290Z INFO:root:creating _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/docs 2025-05-07T20:02:01.7945553Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/docs/__init__.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/docs 2025-05-07T20:02:01.7947079Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/docs/common.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/docs 2025-05-07T20:02:01.7948979Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/docs/examples.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/docs 2025-05-07T20:02:01.7950516Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/docs/jagged_tensor_ops.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/docs 2025-05-07T20:02:01.7952122Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/docs/merge_pooled_embedding_ops.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/docs 2025-05-07T20:02:01.7953912Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/docs/permute_pooled_embedding_ops.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/docs 2025-05-07T20:02:01.7955444Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/docs/quantize_ops.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/docs 2025-05-07T20:02:01.7956981Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/docs/sparse_ops.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/docs 2025-05-07T20:02:01.7958806Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/docs/version.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/docs 2025-05-07T20:02:01.7960773Z INFO:root:creating _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/quantize 2025-05-07T20:02:01.7961967Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/quantize/__init__.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/quantize 2025-05-07T20:02:01.7963443Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/quantize/quantize_ops.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/quantize 2025-05-07T20:02:01.7965088Z INFO:root:creating _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/sll 2025-05-07T20:02:01.7966272Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/sll/__init__.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/sll 2025-05-07T20:02:01.7968141Z INFO:root:creating _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe 2025-05-07T20:02:01.7969395Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/__init__.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe 2025-05-07T20:02:01.7971109Z INFO:root:creating _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/triton 2025-05-07T20:02:01.7972371Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/triton/__init__.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/triton 2025-05-07T20:02:01.7973829Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/triton/common.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/triton 2025-05-07T20:02:01.7975256Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/triton/quantize.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/triton 2025-05-07T20:02:01.7976759Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/triton/quantize_ref.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/triton 2025-05-07T20:02:01.7978842Z INFO:root:creating _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/utils 2025-05-07T20:02:01.7980082Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/utils/__init__.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/utils 2025-05-07T20:02:01.7982149Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/utils/filestore.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/utils 2025-05-07T20:02:01.7983620Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/utils/loader.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/utils 2025-05-07T20:02:01.7985044Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/utils/torch_library.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/utils 2025-05-07T20:02:01.7986326Z INFO:root:creating _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/sll/cpu 2025-05-07T20:02:01.7987488Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/sll/cpu/__init__.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/sll/cpu 2025-05-07T20:02:01.7989138Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/sll/cpu/cpu_sll.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/sll/cpu 2025-05-07T20:02:01.7991103Z INFO:root:creating _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/sll/meta 2025-05-07T20:02:01.7992289Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/sll/meta/__init__.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/sll/meta 2025-05-07T20:02:01.7993927Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/sll/meta/meta_sll.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/sll/meta 2025-05-07T20:02:01.8002358Z INFO:root:creating _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/sll/triton 2025-05-07T20:02:01.8003587Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/sll/triton/__init__.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/sll/triton 2025-05-07T20:02:01.8005122Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/sll/triton/common.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/sll/triton 2025-05-07T20:02:01.8006721Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/sll/triton/triton_dense_jagged_cat_jagged_out.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/sll/triton 2025-05-07T20:02:01.8008445Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/sll/triton/triton_jagged2_to_padded_dense.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/sll/triton 2025-05-07T20:02:01.8010171Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/sll/triton/triton_jagged_bmm.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/sll/triton 2025-05-07T20:02:01.8011817Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/sll/triton/triton_jagged_bmm_jagged_out.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/sll/triton 2025-05-07T20:02:01.8013506Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/sll/triton/triton_jagged_dense_elementwise_add.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/sll/triton 2025-05-07T20:02:01.8015299Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/sll/triton/triton_jagged_dense_elementwise_mul_jagged_out.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/sll/triton 2025-05-07T20:02:01.8017103Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/sll/triton/triton_jagged_dense_flash_attention.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/sll/triton 2025-05-07T20:02:01.8018879Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/sll/triton/triton_jagged_flash_attention_basic.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/sll/triton 2025-05-07T20:02:01.8020718Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/sll/triton/triton_jagged_self_substraction_jagged_out.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/sll/triton 2025-05-07T20:02:01.8022440Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/sll/triton/triton_jagged_softmax.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/sll/triton 2025-05-07T20:02:01.8024151Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/sll/triton/triton_multi_head_jagged_flash_attention.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/sll/triton 2025-05-07T20:02:01.8025464Z INFO:root:creating _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/bench 2025-05-07T20:02:01.8026718Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/bench/__init__.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/bench 2025-05-07T20:02:01.8028358Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/bench/bench_config.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/bench 2025-05-07T20:02:01.8029852Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/bench/bench_runs.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/bench 2025-05-07T20:02:01.8031354Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/bench/eeg_cli.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/bench 2025-05-07T20:02:01.8032935Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/bench/embedding_ops_common_config.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/bench 2025-05-07T20:02:01.8038804Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/bench/eval_compression.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/bench 2025-05-07T20:02:01.8040528Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/bench/reporter.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/bench 2025-05-07T20:02:01.8042061Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/bench/tbe_data_config.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/bench 2025-05-07T20:02:01.8048948Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/bench/tbe_data_config_loader.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/bench 2025-05-07T20:02:01.8054847Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/bench/tbe_data_config_param_models.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/bench 2025-05-07T20:02:01.8056730Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/bench/utils.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/bench 2025-05-07T20:02:01.8058444Z INFO:root:creating _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/cache 2025-05-07T20:02:01.8059709Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/cache/__init__.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/cache 2025-05-07T20:02:01.8065721Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/cache/split_embeddings_cache_ops.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/cache 2025-05-07T20:02:01.8067432Z INFO:root:creating _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/ssd 2025-05-07T20:02:01.8068654Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/ssd/__init__.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/ssd 2025-05-07T20:02:01.8070123Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/ssd/common.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/ssd 2025-05-07T20:02:01.8072070Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/ssd/inference.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/ssd 2025-05-07T20:02:01.8073622Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/ssd/training.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/ssd 2025-05-07T20:02:01.8075836Z INFO:root:creating _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/utils 2025-05-07T20:02:01.8077186Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/utils/__init__.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/utils 2025-05-07T20:02:01.8078731Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/utils/common.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/utils 2025-05-07T20:02:01.8080214Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/utils/offsets.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/utils 2025-05-07T20:02:01.8081743Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/utils/quantize.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/utils 2025-05-07T20:02:01.8083281Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/utils/requests.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/utils 2025-05-07T20:02:01.8084464Z INFO:root:creating _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/stats 2025-05-07T20:02:01.8085670Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/stats/__init__.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/stats 2025-05-07T20:02:01.8087242Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/stats/bench_params_reporter.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/stats 2025-05-07T20:02:01.8088515Z INFO:root:creating _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/ssd/utils 2025-05-07T20:02:01.8089820Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/ssd/utils/__init__.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/ssd/utils 2025-05-07T20:02:01.8091504Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/ssd/utils/partially_materialized_tensor.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/ssd/utils 2025-05-07T20:02:01.8092958Z INFO:root:creating _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/triton/jagged 2025-05-07T20:02:01.8094210Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/triton/jagged/__init__.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/triton/jagged 2025-05-07T20:02:01.8095864Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/triton/jagged/triton_jagged_tensor_ops.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/triton/jagged 2025-05-07T20:02:01.8839179Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/asmjit.so -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu 2025-05-07T20:02:01.8883296Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/fbgemm.so -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu 2025-05-07T20:02:01.9162263Z INFO:root:creating _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/experimental/gen_ai 2025-05-07T20:02:01.9170999Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/experimental/gen_ai/fbgemm_gpu_experimental_gen_ai.so -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/experimental/gen_ai 2025-05-07T20:02:02.3429709Z INFO:root:creating _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/experimental/bench 2025-05-07T20:02:02.3431122Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/experimental/bench/__init__.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/experimental/bench 2025-05-07T20:02:02.3444717Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/experimental/bench/ck_bf16_bench.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/experimental/bench 2025-05-07T20:02:02.3452260Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/experimental/bench/comm_bench.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/experimental/bench 2025-05-07T20:02:02.3464499Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/experimental/bench/gather_scatter_bench.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/experimental/bench 2025-05-07T20:02:02.3475558Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/experimental/bench/quantize_bench.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/experimental/bench 2025-05-07T20:02:02.3488955Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/experimental/bench/quantize_ops.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/experimental/bench 2025-05-07T20:02:02.3509110Z INFO:root:creating _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/experimental/gen_ai/moe 2025-05-07T20:02:02.3512810Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/experimental/gen_ai/moe/README.md -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/experimental/gen_ai/moe 2025-05-07T20:02:02.3520070Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/experimental/gen_ai/moe/__init__.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/experimental/gen_ai/moe 2025-05-07T20:02:02.3530686Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/experimental/gen_ai/moe/activation.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/experimental/gen_ai/moe 2025-05-07T20:02:02.3542406Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/experimental/gen_ai/moe/gather_scatter.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/experimental/gen_ai/moe 2025-05-07T20:02:02.3555800Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/experimental/gen_ai/moe/layers.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/experimental/gen_ai/moe 2025-05-07T20:02:02.3577713Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/experimental/gen_ai/moe/shuffling.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/experimental/gen_ai/moe 2025-05-07T20:02:02.3583039Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/experimental/gen_ai/__init__.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/experimental/gen_ai 2025-05-07T20:02:02.3591525Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/experimental/gen_ai/quantize.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/experimental/gen_ai 2025-05-07T20:02:02.3601875Z INFO:root:creating _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/experimental/example 2025-05-07T20:02:02.3603355Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/experimental/example/fbgemm_gpu_experimental_example_py.so -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/experimental/example 2025-05-07T20:02:02.3639418Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/experimental/example/__init__.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/experimental/example 2025-05-07T20:02:02.3644863Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/experimental/example/utils.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/experimental/example 2025-05-07T20:02:02.3652131Z INFO:root:creating _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/experimental/gemm/triton_gemm 2025-05-07T20:02:02.3653606Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/experimental/gemm/triton_gemm/__init__.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/experimental/gemm/triton_gemm 2025-05-07T20:02:02.3661023Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/experimental/gemm/triton_gemm/fp8_gemm.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/experimental/gemm/triton_gemm 2025-05-07T20:02:02.3687310Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/experimental/gemm/triton_gemm/grouped_gemm.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/experimental/gemm/triton_gemm 2025-05-07T20:02:02.3702345Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/experimental/gemm/triton_gemm/matmul_perf_model.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/experimental/gemm/triton_gemm 2025-05-07T20:02:02.3709636Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/experimental/gemm/triton_gemm/utils.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/experimental/gemm/triton_gemm 2025-05-07T20:02:02.3720812Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/__init__.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu 2025-05-07T20:02:02.3724106Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/batched_unary_embeddings_ops.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu 2025-05-07T20:02:02.3725895Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/enums.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu 2025-05-07T20:02:02.3727253Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/metrics.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu 2025-05-07T20:02:02.3728679Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/permute_pooled_embedding_modules.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu 2025-05-07T20:02:02.3730199Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/permute_pooled_embedding_modules_split.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu 2025-05-07T20:02:02.3731670Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/quantize_comm.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu 2025-05-07T20:02:02.3733032Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/quantize_utils.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu 2025-05-07T20:02:02.3734379Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/runtime_monitor.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu 2025-05-07T20:02:02.3735857Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/sparse_ops.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu 2025-05-07T20:02:02.3737452Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/split_embedding_configs.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu 2025-05-07T20:02:02.3738959Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/split_embedding_inference_converter.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu 2025-05-07T20:02:02.3740857Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/split_embedding_optimizer_ops.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu 2025-05-07T20:02:02.3742390Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/split_embedding_utils.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu 2025-05-07T20:02:02.3743890Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/split_table_batched_embeddings_ops.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu 2025-05-07T20:02:02.3745460Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/split_table_batched_embeddings_ops_common.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu 2025-05-07T20:02:02.3747065Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/split_table_batched_embeddings_ops_inference.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu 2025-05-07T20:02:02.3748684Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/split_table_batched_embeddings_ops_training.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu 2025-05-07T20:02:02.3769602Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/split_table_batched_embeddings_ops_training_common.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu 2025-05-07T20:02:02.3771260Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/ssd_split_table_batched_embeddings_ops.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu 2025-05-07T20:02:02.3772761Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe_input_multiplexer.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu 2025-05-07T20:02:02.3774078Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/uvm.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu 2025-05-07T20:02:02.3775532Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/config/__init__.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/config 2025-05-07T20:02:02.3776983Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/config/feature_list.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/config 2025-05-07T20:02:02.3778385Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/docs/__init__.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/docs 2025-05-07T20:02:02.3779894Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/docs/common.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/docs 2025-05-07T20:02:02.3781305Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/docs/examples.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/docs 2025-05-07T20:02:02.3782723Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/docs/jagged_tensor_ops.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/docs 2025-05-07T20:02:02.3784285Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/docs/merge_pooled_embedding_ops.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/docs 2025-05-07T20:02:02.3785851Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/docs/permute_pooled_embedding_ops.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/docs 2025-05-07T20:02:02.3787335Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/docs/quantize_ops.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/docs 2025-05-07T20:02:02.3788765Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/docs/sparse_ops.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/docs 2025-05-07T20:02:02.3790177Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/docs/version.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/docs 2025-05-07T20:02:02.3791622Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/quantize/__init__.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/quantize 2025-05-07T20:02:02.3793143Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/quantize/quantize_ops.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/quantize 2025-05-07T20:02:02.3794593Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/sll/__init__.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/sll 2025-05-07T20:02:02.3795929Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/__init__.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe 2025-05-07T20:02:02.3797319Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/triton/__init__.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/triton 2025-05-07T20:02:02.3798754Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/triton/common.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/triton 2025-05-07T20:02:02.3812127Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/triton/quantize.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/triton 2025-05-07T20:02:02.3814597Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/triton/quantize_ref.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/triton 2025-05-07T20:02:02.3816084Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/utils/__init__.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/utils 2025-05-07T20:02:02.3817621Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/utils/filestore.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/utils 2025-05-07T20:02:02.3819039Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/utils/loader.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/utils 2025-05-07T20:02:02.3820782Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/utils/torch_library.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/utils 2025-05-07T20:02:02.3822257Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/sll/cpu/__init__.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/sll/cpu 2025-05-07T20:02:02.3823655Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/sll/cpu/cpu_sll.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/sll/cpu 2025-05-07T20:02:02.3825135Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/sll/meta/__init__.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/sll/meta 2025-05-07T20:02:02.3826621Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/sll/meta/meta_sll.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/sll/meta 2025-05-07T20:02:02.3828053Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/sll/triton/__init__.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/sll/triton 2025-05-07T20:02:02.3829525Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/sll/triton/common.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/sll/triton 2025-05-07T20:02:02.3831161Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/sll/triton/triton_dense_jagged_cat_jagged_out.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/sll/triton 2025-05-07T20:02:02.3832965Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/sll/triton/triton_jagged2_to_padded_dense.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/sll/triton 2025-05-07T20:02:02.3834861Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/sll/triton/triton_jagged_bmm.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/sll/triton 2025-05-07T20:02:02.3836435Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/sll/triton/triton_jagged_bmm_jagged_out.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/sll/triton 2025-05-07T20:02:02.3838080Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/sll/triton/triton_jagged_dense_elementwise_add.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/sll/triton 2025-05-07T20:02:02.3839876Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/sll/triton/triton_jagged_dense_elementwise_mul_jagged_out.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/sll/triton 2025-05-07T20:02:02.3841652Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/sll/triton/triton_jagged_dense_flash_attention.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/sll/triton 2025-05-07T20:02:02.3843348Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/sll/triton/triton_jagged_flash_attention_basic.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/sll/triton 2025-05-07T20:02:02.3845114Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/sll/triton/triton_jagged_self_substraction_jagged_out.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/sll/triton 2025-05-07T20:02:02.3846874Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/sll/triton/triton_jagged_softmax.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/sll/triton 2025-05-07T20:02:02.3848540Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/sll/triton/triton_multi_head_jagged_flash_attention.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/sll/triton 2025-05-07T20:02:02.3850171Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/bench/__init__.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/bench 2025-05-07T20:02:02.3851661Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/bench/bench_config.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/bench 2025-05-07T20:02:02.3853181Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/bench/bench_runs.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/bench 2025-05-07T20:02:02.3854639Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/bench/eeg_cli.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/bench 2025-05-07T20:02:02.3856238Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/bench/embedding_ops_common_config.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/bench 2025-05-07T20:02:02.3857853Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/bench/eval_compression.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/bench 2025-05-07T20:02:02.3859346Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/bench/reporter.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/bench 2025-05-07T20:02:02.3860923Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/bench/tbe_data_config.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/bench 2025-05-07T20:02:02.3862487Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/bench/tbe_data_config_loader.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/bench 2025-05-07T20:02:02.3864075Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/bench/tbe_data_config_param_models.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/bench 2025-05-07T20:02:02.3865598Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/bench/utils.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/bench 2025-05-07T20:02:02.3867039Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/cache/__init__.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/cache 2025-05-07T20:02:02.3868534Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/cache/split_embeddings_cache_ops.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/cache 2025-05-07T20:02:02.3870033Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/ssd/__init__.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/ssd 2025-05-07T20:02:02.3871416Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/ssd/common.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/ssd 2025-05-07T20:02:02.3872817Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/ssd/inference.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/ssd 2025-05-07T20:02:02.3874245Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/ssd/training.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/ssd 2025-05-07T20:02:02.3875701Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/utils/__init__.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/utils 2025-05-07T20:02:02.3877140Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/utils/common.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/utils 2025-05-07T20:02:02.3878570Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/utils/offsets.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/utils 2025-05-07T20:02:02.3880032Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/utils/quantize.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/utils 2025-05-07T20:02:02.3881505Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/utils/requests.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/utils 2025-05-07T20:02:02.3882943Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/stats/__init__.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/stats 2025-05-07T20:02:02.3884496Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/stats/bench_params_reporter.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/stats 2025-05-07T20:02:02.3886070Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/ssd/utils/__init__.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/ssd/utils 2025-05-07T20:02:02.3887681Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/tbe/ssd/utils/partially_materialized_tensor.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/ssd/utils 2025-05-07T20:02:02.3889337Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/triton/jagged/__init__.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/triton/jagged 2025-05-07T20:02:02.3890996Z INFO:root:copying _skbuild/linux-x86_64-3.13/cmake-install/fbgemm_gpu/triton/jagged/triton_jagged_tensor_ops.py -> _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/triton/jagged 2025-05-07T20:02:02.4008307Z INFO:skbuild:copied 90 files 2025-05-07T20:02:02.4008694Z INFO:root:running build_ext 2025-05-07T20:02:02.4665402Z INFO:root:installing to _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel 2025-05-07T20:02:02.4665948Z INFO:root:running install 2025-05-07T20:02:02.5240790Z INFO:root:running install_lib 2025-05-07T20:02:02.5318155Z INFO:root:creating _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel 2025-05-07T20:02:02.5338298Z INFO:root:creating _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/fbgemm_gpu 2025-05-07T20:02:02.5339153Z INFO:root:creating _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/fbgemm_gpu/config 2025-05-07T20:02:02.5340495Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/config/__init__.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/config 2025-05-07T20:02:02.5342156Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/config/feature_list.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/config 2025-05-07T20:02:02.5343398Z INFO:root:creating _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/fbgemm_gpu/docs 2025-05-07T20:02:02.5344565Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/docs/__init__.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/docs 2025-05-07T20:02:02.5346335Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/docs/common.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/docs 2025-05-07T20:02:02.5347943Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/docs/examples.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/docs 2025-05-07T20:02:02.5349563Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/docs/jagged_tensor_ops.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/docs 2025-05-07T20:02:02.5351273Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/docs/merge_pooled_embedding_ops.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/docs 2025-05-07T20:02:02.5353030Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/docs/permute_pooled_embedding_ops.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/docs 2025-05-07T20:02:02.5354712Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/docs/quantize_ops.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/docs 2025-05-07T20:02:02.5356386Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/docs/sparse_ops.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/docs 2025-05-07T20:02:02.5357995Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/docs/version.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/docs 2025-05-07T20:02:02.5359180Z INFO:root:creating _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/fbgemm_gpu/quantize 2025-05-07T20:02:02.5360435Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/quantize/__init__.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/quantize 2025-05-07T20:02:02.5362160Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/quantize/quantize_ops.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/quantize 2025-05-07T20:02:02.5363390Z INFO:root:creating _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/fbgemm_gpu/sll 2025-05-07T20:02:02.5364256Z INFO:root:creating _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/fbgemm_gpu/sll/cpu 2025-05-07T20:02:02.5365484Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/sll/cpu/__init__.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/sll/cpu 2025-05-07T20:02:02.5367095Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/sll/cpu/cpu_sll.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/sll/cpu 2025-05-07T20:02:02.5368346Z INFO:root:creating _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/fbgemm_gpu/sll/meta 2025-05-07T20:02:02.5369569Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/sll/meta/__init__.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/sll/meta 2025-05-07T20:02:02.5371237Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/sll/meta/meta_sll.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/sll/meta 2025-05-07T20:02:02.5372499Z INFO:root:creating _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/fbgemm_gpu/sll/triton 2025-05-07T20:02:02.5373745Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/sll/triton/__init__.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/sll/triton 2025-05-07T20:02:02.5375473Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/sll/triton/common.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/sll/triton 2025-05-07T20:02:02.5377297Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/sll/triton/triton_dense_jagged_cat_jagged_out.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/sll/triton 2025-05-07T20:02:02.5379355Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/sll/triton/triton_jagged2_to_padded_dense.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/sll/triton 2025-05-07T20:02:02.5381257Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/sll/triton/triton_jagged_bmm.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/sll/triton 2025-05-07T20:02:02.5383097Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/sll/triton/triton_jagged_bmm_jagged_out.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/sll/triton 2025-05-07T20:02:02.5385078Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/sll/triton/triton_jagged_dense_elementwise_add.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/sll/triton 2025-05-07T20:02:02.5387091Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/sll/triton/triton_jagged_dense_elementwise_mul_jagged_out.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/sll/triton 2025-05-07T20:02:02.5389053Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/sll/triton/triton_jagged_dense_flash_attention.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/sll/triton 2025-05-07T20:02:02.5391029Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/sll/triton/triton_jagged_flash_attention_basic.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/sll/triton 2025-05-07T20:02:02.5393051Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/sll/triton/triton_jagged_self_substraction_jagged_out.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/sll/triton 2025-05-07T20:02:02.5394964Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/sll/triton/triton_jagged_softmax.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/sll/triton 2025-05-07T20:02:02.5396880Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/sll/triton/triton_multi_head_jagged_flash_attention.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/sll/triton 2025-05-07T20:02:02.5398663Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/sll/__init__.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/sll 2025-05-07T20:02:02.5399820Z INFO:root:creating _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/fbgemm_gpu/tbe 2025-05-07T20:02:02.5400826Z INFO:root:creating _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/fbgemm_gpu/tbe/bench 2025-05-07T20:02:02.5402112Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/bench/__init__.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe/bench 2025-05-07T20:02:02.5403790Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/bench/bench_config.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe/bench 2025-05-07T20:02:02.5405578Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/bench/bench_runs.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe/bench 2025-05-07T20:02:02.5407281Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/bench/eeg_cli.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe/bench 2025-05-07T20:02:02.5409034Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/bench/embedding_ops_common_config.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe/bench 2025-05-07T20:02:02.5410864Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/bench/eval_compression.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe/bench 2025-05-07T20:02:02.5413510Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/bench/reporter.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe/bench 2025-05-07T20:02:02.5415222Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/bench/tbe_data_config.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe/bench 2025-05-07T20:02:02.5417071Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/bench/tbe_data_config_loader.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe/bench 2025-05-07T20:02:02.5418921Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/bench/tbe_data_config_param_models.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe/bench 2025-05-07T20:02:02.5420744Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/bench/utils.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe/bench 2025-05-07T20:02:02.5421997Z INFO:root:creating _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/fbgemm_gpu/tbe/cache 2025-05-07T20:02:02.5423329Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/cache/__init__.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe/cache 2025-05-07T20:02:02.5425132Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/cache/split_embeddings_cache_ops.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe/cache 2025-05-07T20:02:02.5426453Z INFO:root:creating _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/fbgemm_gpu/tbe/ssd 2025-05-07T20:02:02.5427329Z INFO:root:creating _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/fbgemm_gpu/tbe/ssd/utils 2025-05-07T20:02:02.5428630Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/ssd/utils/__init__.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe/ssd/utils 2025-05-07T20:02:02.5430511Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/ssd/utils/partially_materialized_tensor.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe/ssd/utils 2025-05-07T20:02:02.5432316Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/ssd/__init__.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe/ssd 2025-05-07T20:02:02.5433958Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/ssd/common.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe/ssd 2025-05-07T20:02:02.5435620Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/ssd/inference.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe/ssd 2025-05-07T20:02:02.5437291Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/ssd/training.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe/ssd 2025-05-07T20:02:02.5438553Z INFO:root:creating _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/fbgemm_gpu/tbe/utils 2025-05-07T20:02:02.5439785Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/utils/__init__.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe/utils 2025-05-07T20:02:02.5441474Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/utils/common.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe/utils 2025-05-07T20:02:02.5443169Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/utils/offsets.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe/utils 2025-05-07T20:02:02.5444877Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/utils/quantize.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe/utils 2025-05-07T20:02:02.5446584Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/utils/requests.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe/utils 2025-05-07T20:02:02.5447863Z INFO:root:creating _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/fbgemm_gpu/tbe/stats 2025-05-07T20:02:02.5449257Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/stats/__init__.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe/stats 2025-05-07T20:02:02.5451001Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/stats/bench_params_reporter.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe/stats 2025-05-07T20:02:02.5452712Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe/__init__.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe 2025-05-07T20:02:02.5453917Z INFO:root:creating _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/fbgemm_gpu/triton 2025-05-07T20:02:02.5454774Z INFO:root:creating _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/fbgemm_gpu/triton/jagged 2025-05-07T20:02:02.5456065Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/triton/jagged/__init__.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/triton/jagged 2025-05-07T20:02:02.5457903Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/triton/jagged/triton_jagged_tensor_ops.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/triton/jagged 2025-05-07T20:02:02.5459727Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/triton/__init__.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/triton 2025-05-07T20:02:02.5461357Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/triton/common.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/triton 2025-05-07T20:02:02.5462968Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/triton/quantize.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/triton 2025-05-07T20:02:02.5464636Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/triton/quantize_ref.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/triton 2025-05-07T20:02:02.5465935Z INFO:root:creating _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/fbgemm_gpu/utils 2025-05-07T20:02:02.5467114Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/utils/__init__.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/utils 2025-05-07T20:02:02.5468737Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/utils/filestore.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/utils 2025-05-07T20:02:02.5470339Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/utils/loader.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/utils 2025-05-07T20:02:02.5471976Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/utils/torch_library.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/utils 2025-05-07T20:02:02.5473576Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/asmjit.so -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu 2025-05-07T20:02:02.5475073Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/fbgemm.so -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu 2025-05-07T20:02:02.5499067Z INFO:root:creating _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/fbgemm_gpu/experimental 2025-05-07T20:02:02.5500160Z INFO:root:creating _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/fbgemm_gpu/experimental/gen_ai 2025-05-07T20:02:02.5501832Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/experimental/gen_ai/fbgemm_gpu_experimental_gen_ai.so -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/experimental/gen_ai 2025-05-07T20:02:02.6093180Z INFO:root:creating _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/fbgemm_gpu/experimental/gen_ai/moe 2025-05-07T20:02:02.6094990Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/experimental/gen_ai/moe/README.md -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/experimental/gen_ai/moe 2025-05-07T20:02:02.6097017Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/experimental/gen_ai/moe/__init__.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/experimental/gen_ai/moe 2025-05-07T20:02:02.6098958Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/experimental/gen_ai/moe/activation.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/experimental/gen_ai/moe 2025-05-07T20:02:02.6101298Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/experimental/gen_ai/moe/gather_scatter.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/experimental/gen_ai/moe 2025-05-07T20:02:02.6103284Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/experimental/gen_ai/moe/layers.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/experimental/gen_ai/moe 2025-05-07T20:02:02.6105205Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/experimental/gen_ai/moe/shuffling.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/experimental/gen_ai/moe 2025-05-07T20:02:02.6107098Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/experimental/gen_ai/__init__.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/experimental/gen_ai 2025-05-07T20:02:02.6109061Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/experimental/gen_ai/quantize.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/experimental/gen_ai 2025-05-07T20:02:02.6110444Z INFO:root:creating _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/fbgemm_gpu/experimental/bench 2025-05-07T20:02:02.6111834Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/experimental/bench/__init__.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/experimental/bench 2025-05-07T20:02:02.6113701Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/experimental/bench/ck_bf16_bench.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/experimental/bench 2025-05-07T20:02:02.6115569Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/experimental/bench/comm_bench.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/experimental/bench 2025-05-07T20:02:02.6117553Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/experimental/bench/gather_scatter_bench.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/experimental/bench 2025-05-07T20:02:02.6119518Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/experimental/bench/quantize_bench.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/experimental/bench 2025-05-07T20:02:02.6121407Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/experimental/bench/quantize_ops.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/experimental/bench 2025-05-07T20:02:02.6122831Z INFO:root:creating _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/fbgemm_gpu/experimental/example 2025-05-07T20:02:02.6124416Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/experimental/example/fbgemm_gpu_experimental_example_py.so -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/experimental/example 2025-05-07T20:02:02.6126392Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/experimental/example/__init__.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/experimental/example 2025-05-07T20:02:02.6128312Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/experimental/example/utils.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/experimental/example 2025-05-07T20:02:02.6129722Z INFO:root:creating _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/fbgemm_gpu/experimental/gemm 2025-05-07T20:02:02.6130670Z INFO:root:creating _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/fbgemm_gpu/experimental/gemm/triton_gemm 2025-05-07T20:02:02.6132206Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/experimental/gemm/triton_gemm/__init__.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/experimental/gemm/triton_gemm 2025-05-07T20:02:02.6134234Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/experimental/gemm/triton_gemm/fp8_gemm.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/experimental/gemm/triton_gemm 2025-05-07T20:02:02.6136294Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/experimental/gemm/triton_gemm/grouped_gemm.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/experimental/gemm/triton_gemm 2025-05-07T20:02:02.6138414Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/experimental/gemm/triton_gemm/matmul_perf_model.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/experimental/gemm/triton_gemm 2025-05-07T20:02:02.6140562Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/experimental/gemm/triton_gemm/utils.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/experimental/gemm/triton_gemm 2025-05-07T20:02:02.6142327Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/__init__.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu 2025-05-07T20:02:02.6143885Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/batched_unary_embeddings_ops.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu 2025-05-07T20:02:02.6145479Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/enums.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu 2025-05-07T20:02:02.6146984Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/metrics.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu 2025-05-07T20:02:02.6148598Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/permute_pooled_embedding_modules.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu 2025-05-07T20:02:02.6150342Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/permute_pooled_embedding_modules_split.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu 2025-05-07T20:02:02.6152001Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/quantize_comm.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu 2025-05-07T20:02:02.6153535Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/quantize_utils.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu 2025-05-07T20:02:02.6155134Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/runtime_monitor.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu 2025-05-07T20:02:02.6156716Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/sparse_ops.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu 2025-05-07T20:02:02.6158274Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/split_embedding_configs.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu 2025-05-07T20:02:02.6159977Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/split_embedding_inference_converter.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu 2025-05-07T20:02:02.6161706Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/split_embedding_optimizer_ops.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu 2025-05-07T20:02:02.6163326Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/split_embedding_utils.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu 2025-05-07T20:02:02.6165007Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/split_table_batched_embeddings_ops.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu 2025-05-07T20:02:02.6166756Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/split_table_batched_embeddings_ops_common.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu 2025-05-07T20:02:02.6168564Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/split_table_batched_embeddings_ops_inference.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu 2025-05-07T20:02:02.6170392Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/split_table_batched_embeddings_ops_training.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu 2025-05-07T20:02:02.6172237Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/split_table_batched_embeddings_ops_training_common.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu 2025-05-07T20:02:02.6174049Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/ssd_split_table_batched_embeddings_ops.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu 2025-05-07T20:02:02.6175720Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/tbe_input_multiplexer.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu 2025-05-07T20:02:02.6177304Z INFO:root:copying _skbuild/linux-x86_64-3.13/setuptools/lib.linux-x86_64-cpython-313/fbgemm_gpu/uvm.py -> _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu 2025-05-07T20:02:02.6178207Z INFO:skbuild:copied 115 files 2025-05-07T20:02:02.6178566Z INFO:root:running install_egg_info 2025-05-07T20:02:02.7581541Z INFO:root:running egg_info 2025-05-07T20:02:02.7643847Z INFO:root:creating fbgemm_gpu_genai_nightly.egg-info 2025-05-07T20:02:02.7654732Z INFO:root:writing fbgemm_gpu_genai_nightly.egg-info/PKG-INFO 2025-05-07T20:02:02.7943929Z INFO:root:writing dependency_links to fbgemm_gpu_genai_nightly.egg-info/dependency_links.txt 2025-05-07T20:02:02.8017142Z INFO:root:writing requirements to fbgemm_gpu_genai_nightly.egg-info/requires.txt 2025-05-07T20:02:02.8017854Z INFO:root:writing top-level names to fbgemm_gpu_genai_nightly.egg-info/top_level.txt 2025-05-07T20:02:02.8078578Z INFO:root:writing manifest file 'fbgemm_gpu_genai_nightly.egg-info/SOURCES.txt' 2025-05-07T20:02:02.8517993Z INFO:root:reading manifest file 'fbgemm_gpu_genai_nightly.egg-info/SOURCES.txt' 2025-05-07T20:02:02.8571782Z INFO:root:writing manifest file 'fbgemm_gpu_genai_nightly.egg-info/SOURCES.txt' 2025-05-07T20:02:02.8605064Z INFO:root:Copying fbgemm_gpu_genai_nightly.egg-info to _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu_genai_nightly-2025.5.7-py3.13.egg-info 2025-05-07T20:02:02.8646106Z INFO:root:running install_scripts 2025-05-07T20:02:02.8646552Z INFO:skbuild:copied 0 files 2025-05-07T20:02:10.3470531Z INFO:root:creating _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel/fbgemm_gpu_genai_nightly-2025.5.7.dist-info/WHEEL 2025-05-07T20:02:10.3867965Z INFO:wheel:creating '/__w/FBGEMM/FBGEMM/fbgemm_gpu/dist/.tmp-y5sgrcjz/fbgemm_gpu_genai_nightly-2025.5.7-cp313-cp313-manylinux_2_28_x86_64.whl' and adding '_skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel' to it 2025-05-07T20:02:10.4084988Z INFO:wheel:adding 'fbgemm_gpu/__init__.py' 2025-05-07T20:02:10.4646868Z INFO:wheel:adding 'fbgemm_gpu/asmjit.so' 2025-05-07T20:02:10.4658835Z INFO:wheel:adding 'fbgemm_gpu/batched_unary_embeddings_ops.py' 2025-05-07T20:02:10.4659351Z INFO:wheel:adding 'fbgemm_gpu/enums.py' 2025-05-07T20:02:10.6268551Z INFO:wheel:adding 'fbgemm_gpu/fbgemm.so' 2025-05-07T20:02:10.6395054Z INFO:wheel:adding 'fbgemm_gpu/metrics.py' 2025-05-07T20:02:10.6396651Z INFO:wheel:adding 'fbgemm_gpu/permute_pooled_embedding_modules.py' 2025-05-07T20:02:10.6407299Z INFO:wheel:adding 'fbgemm_gpu/permute_pooled_embedding_modules_split.py' 2025-05-07T20:02:10.6415195Z INFO:wheel:adding 'fbgemm_gpu/quantize_comm.py' 2025-05-07T20:02:10.6418102Z INFO:wheel:adding 'fbgemm_gpu/quantize_utils.py' 2025-05-07T20:02:10.6421092Z INFO:wheel:adding 'fbgemm_gpu/runtime_monitor.py' 2025-05-07T20:02:10.6431705Z INFO:wheel:adding 'fbgemm_gpu/sparse_ops.py' 2025-05-07T20:02:10.6435227Z INFO:wheel:adding 'fbgemm_gpu/split_embedding_configs.py' 2025-05-07T20:02:10.6437852Z INFO:wheel:adding 'fbgemm_gpu/split_embedding_inference_converter.py' 2025-05-07T20:02:10.6439352Z INFO:wheel:adding 'fbgemm_gpu/split_embedding_optimizer_ops.py' 2025-05-07T20:02:10.6440525Z INFO:wheel:adding 'fbgemm_gpu/split_embedding_utils.py' 2025-05-07T20:02:10.6442294Z INFO:wheel:adding 'fbgemm_gpu/split_table_batched_embeddings_ops.py' 2025-05-07T20:02:10.6445271Z INFO:wheel:adding 'fbgemm_gpu/split_table_batched_embeddings_ops_common.py' 2025-05-07T20:02:10.6466552Z INFO:wheel:adding 'fbgemm_gpu/split_table_batched_embeddings_ops_inference.py' 2025-05-07T20:02:10.6507499Z INFO:wheel:adding 'fbgemm_gpu/split_table_batched_embeddings_ops_training.py' 2025-05-07T20:02:10.6512527Z INFO:wheel:adding 'fbgemm_gpu/split_table_batched_embeddings_ops_training_common.py' 2025-05-07T20:02:10.6513949Z INFO:wheel:adding 'fbgemm_gpu/ssd_split_table_batched_embeddings_ops.py' 2025-05-07T20:02:10.6515835Z INFO:wheel:adding 'fbgemm_gpu/tbe_input_multiplexer.py' 2025-05-07T20:02:10.6517165Z INFO:wheel:adding 'fbgemm_gpu/uvm.py' 2025-05-07T20:02:10.6518955Z INFO:wheel:adding 'fbgemm_gpu/config/__init__.py' 2025-05-07T20:02:10.6520659Z INFO:wheel:adding 'fbgemm_gpu/config/feature_list.py' 2025-05-07T20:02:10.6522164Z INFO:wheel:adding 'fbgemm_gpu/docs/__init__.py' 2025-05-07T20:02:10.6523374Z INFO:wheel:adding 'fbgemm_gpu/docs/common.py' 2025-05-07T20:02:10.6525080Z INFO:wheel:adding 'fbgemm_gpu/docs/examples.py' 2025-05-07T20:02:10.6527428Z INFO:wheel:adding 'fbgemm_gpu/docs/jagged_tensor_ops.py' 2025-05-07T20:02:10.6528986Z INFO:wheel:adding 'fbgemm_gpu/docs/merge_pooled_embedding_ops.py' 2025-05-07T20:02:10.6531054Z INFO:wheel:adding 'fbgemm_gpu/docs/permute_pooled_embedding_ops.py' 2025-05-07T20:02:10.6532600Z INFO:wheel:adding 'fbgemm_gpu/docs/quantize_ops.py' 2025-05-07T20:02:10.6538084Z INFO:wheel:adding 'fbgemm_gpu/docs/sparse_ops.py' 2025-05-07T20:02:10.6541837Z INFO:wheel:adding 'fbgemm_gpu/docs/version.py' 2025-05-07T20:02:10.6542664Z INFO:wheel:adding 'fbgemm_gpu/experimental/bench/__init__.py' 2025-05-07T20:02:10.6544384Z INFO:wheel:adding 'fbgemm_gpu/experimental/bench/ck_bf16_bench.py' 2025-05-07T20:02:10.6547177Z INFO:wheel:adding 'fbgemm_gpu/experimental/bench/comm_bench.py' 2025-05-07T20:02:10.6550782Z INFO:wheel:adding 'fbgemm_gpu/experimental/bench/gather_scatter_bench.py' 2025-05-07T20:02:10.6556362Z INFO:wheel:adding 'fbgemm_gpu/experimental/bench/quantize_bench.py' 2025-05-07T20:02:10.6572564Z INFO:wheel:adding 'fbgemm_gpu/experimental/bench/quantize_ops.py' 2025-05-07T20:02:10.6573459Z INFO:wheel:adding 'fbgemm_gpu/experimental/example/__init__.py' 2025-05-07T20:02:10.6724696Z INFO:wheel:adding 'fbgemm_gpu/experimental/example/fbgemm_gpu_experimental_example_py.so' 2025-05-07T20:02:10.6733132Z INFO:wheel:adding 'fbgemm_gpu/experimental/example/utils.py' 2025-05-07T20:02:10.6733888Z INFO:wheel:adding 'fbgemm_gpu/experimental/gemm/triton_gemm/__init__.py' 2025-05-07T20:02:10.6763395Z INFO:wheel:adding 'fbgemm_gpu/experimental/gemm/triton_gemm/fp8_gemm.py' 2025-05-07T20:02:10.6769322Z INFO:wheel:adding 'fbgemm_gpu/experimental/gemm/triton_gemm/grouped_gemm.py' 2025-05-07T20:02:10.6773160Z INFO:wheel:adding 'fbgemm_gpu/experimental/gemm/triton_gemm/matmul_perf_model.py' 2025-05-07T20:02:10.6775205Z INFO:wheel:adding 'fbgemm_gpu/experimental/gemm/triton_gemm/utils.py' 2025-05-07T20:02:10.6776966Z INFO:wheel:adding 'fbgemm_gpu/experimental/gen_ai/__init__.py' 2025-05-07T20:02:12.6478930Z INFO:wheel:adding 'fbgemm_gpu/experimental/gen_ai/fbgemm_gpu_experimental_gen_ai.so' 2025-05-07T20:02:12.8465538Z INFO:wheel:adding 'fbgemm_gpu/experimental/gen_ai/quantize.py' 2025-05-07T20:02:12.8466424Z INFO:wheel:adding 'fbgemm_gpu/experimental/gen_ai/moe/README.md' 2025-05-07T20:02:12.8468065Z INFO:wheel:adding 'fbgemm_gpu/experimental/gen_ai/moe/__init__.py' 2025-05-07T20:02:12.8486124Z INFO:wheel:adding 'fbgemm_gpu/experimental/gen_ai/moe/activation.py' 2025-05-07T20:02:12.8489085Z INFO:wheel:adding 'fbgemm_gpu/experimental/gen_ai/moe/gather_scatter.py' 2025-05-07T20:02:12.8498320Z INFO:wheel:adding 'fbgemm_gpu/experimental/gen_ai/moe/layers.py' 2025-05-07T20:02:12.8502988Z INFO:wheel:adding 'fbgemm_gpu/experimental/gen_ai/moe/shuffling.py' 2025-05-07T20:02:12.8504764Z INFO:wheel:adding 'fbgemm_gpu/quantize/__init__.py' 2025-05-07T20:02:12.8506430Z INFO:wheel:adding 'fbgemm_gpu/quantize/quantize_ops.py' 2025-05-07T20:02:12.8508345Z INFO:wheel:adding 'fbgemm_gpu/sll/__init__.py' 2025-05-07T20:02:12.8510257Z INFO:wheel:adding 'fbgemm_gpu/sll/cpu/__init__.py' 2025-05-07T20:02:12.8516299Z INFO:wheel:adding 'fbgemm_gpu/sll/cpu/cpu_sll.py' 2025-05-07T20:02:12.8518569Z INFO:wheel:adding 'fbgemm_gpu/sll/meta/__init__.py' 2025-05-07T20:02:12.8520898Z INFO:wheel:adding 'fbgemm_gpu/sll/meta/meta_sll.py' 2025-05-07T20:02:12.8523210Z INFO:wheel:adding 'fbgemm_gpu/sll/triton/__init__.py' 2025-05-07T20:02:12.8524649Z INFO:wheel:adding 'fbgemm_gpu/sll/triton/common.py' 2025-05-07T20:02:12.8526325Z INFO:wheel:adding 'fbgemm_gpu/sll/triton/triton_dense_jagged_cat_jagged_out.py' 2025-05-07T20:02:12.8528601Z INFO:wheel:adding 'fbgemm_gpu/sll/triton/triton_jagged2_to_padded_dense.py' 2025-05-07T20:02:12.8532166Z INFO:wheel:adding 'fbgemm_gpu/sll/triton/triton_jagged_bmm.py' 2025-05-07T20:02:12.8535864Z INFO:wheel:adding 'fbgemm_gpu/sll/triton/triton_jagged_bmm_jagged_out.py' 2025-05-07T20:02:12.8537784Z INFO:wheel:adding 'fbgemm_gpu/sll/triton/triton_jagged_dense_elementwise_add.py' 2025-05-07T20:02:12.8539944Z INFO:wheel:adding 'fbgemm_gpu/sll/triton/triton_jagged_dense_elementwise_mul_jagged_out.py' 2025-05-07T20:02:12.8545435Z INFO:wheel:adding 'fbgemm_gpu/sll/triton/triton_jagged_dense_flash_attention.py' 2025-05-07T20:02:12.8550686Z INFO:wheel:adding 'fbgemm_gpu/sll/triton/triton_jagged_flash_attention_basic.py' 2025-05-07T20:02:12.8552703Z INFO:wheel:adding 'fbgemm_gpu/sll/triton/triton_jagged_self_substraction_jagged_out.py' 2025-05-07T20:02:12.8556350Z INFO:wheel:adding 'fbgemm_gpu/sll/triton/triton_jagged_softmax.py' 2025-05-07T20:02:12.8561471Z INFO:wheel:adding 'fbgemm_gpu/sll/triton/triton_multi_head_jagged_flash_attention.py' 2025-05-07T20:02:12.8563332Z INFO:wheel:adding 'fbgemm_gpu/tbe/__init__.py' 2025-05-07T20:02:12.8565242Z INFO:wheel:adding 'fbgemm_gpu/tbe/bench/__init__.py' 2025-05-07T20:02:12.8567154Z INFO:wheel:adding 'fbgemm_gpu/tbe/bench/bench_config.py' 2025-05-07T20:02:12.8571891Z INFO:wheel:adding 'fbgemm_gpu/tbe/bench/bench_runs.py' 2025-05-07T20:02:12.8574224Z INFO:wheel:adding 'fbgemm_gpu/tbe/bench/eeg_cli.py' 2025-05-07T20:02:12.8576485Z INFO:wheel:adding 'fbgemm_gpu/tbe/bench/embedding_ops_common_config.py' 2025-05-07T20:02:12.8578208Z INFO:wheel:adding 'fbgemm_gpu/tbe/bench/eval_compression.py' 2025-05-07T20:02:12.8579610Z INFO:wheel:adding 'fbgemm_gpu/tbe/bench/reporter.py' 2025-05-07T20:02:12.8582837Z INFO:wheel:adding 'fbgemm_gpu/tbe/bench/tbe_data_config.py' 2025-05-07T20:02:12.8585350Z INFO:wheel:adding 'fbgemm_gpu/tbe/bench/tbe_data_config_loader.py' 2025-05-07T20:02:12.8587673Z INFO:wheel:adding 'fbgemm_gpu/tbe/bench/tbe_data_config_param_models.py' 2025-05-07T20:02:12.8589310Z INFO:wheel:adding 'fbgemm_gpu/tbe/bench/utils.py' 2025-05-07T20:02:12.8590724Z INFO:wheel:adding 'fbgemm_gpu/tbe/cache/__init__.py' 2025-05-07T20:02:12.8592271Z INFO:wheel:adding 'fbgemm_gpu/tbe/cache/split_embeddings_cache_ops.py' 2025-05-07T20:02:12.8593598Z INFO:wheel:adding 'fbgemm_gpu/tbe/ssd/__init__.py' 2025-05-07T20:02:12.8594813Z INFO:wheel:adding 'fbgemm_gpu/tbe/ssd/common.py' 2025-05-07T20:02:12.8600647Z INFO:wheel:adding 'fbgemm_gpu/tbe/ssd/inference.py' 2025-05-07T20:02:12.8625714Z INFO:wheel:adding 'fbgemm_gpu/tbe/ssd/training.py' 2025-05-07T20:02:12.8628996Z INFO:wheel:adding 'fbgemm_gpu/tbe/ssd/utils/__init__.py' 2025-05-07T20:02:12.8631766Z INFO:wheel:adding 'fbgemm_gpu/tbe/ssd/utils/partially_materialized_tensor.py' 2025-05-07T20:02:12.8633247Z INFO:wheel:adding 'fbgemm_gpu/tbe/stats/__init__.py' 2025-05-07T20:02:12.8635825Z INFO:wheel:adding 'fbgemm_gpu/tbe/stats/bench_params_reporter.py' 2025-05-07T20:02:12.8637397Z INFO:wheel:adding 'fbgemm_gpu/tbe/utils/__init__.py' 2025-05-07T20:02:12.8638768Z INFO:wheel:adding 'fbgemm_gpu/tbe/utils/common.py' 2025-05-07T20:02:12.8640318Z INFO:wheel:adding 'fbgemm_gpu/tbe/utils/offsets.py' 2025-05-07T20:02:12.8642679Z INFO:wheel:adding 'fbgemm_gpu/tbe/utils/quantize.py' 2025-05-07T20:02:12.8648061Z INFO:wheel:adding 'fbgemm_gpu/tbe/utils/requests.py' 2025-05-07T20:02:12.8649976Z INFO:wheel:adding 'fbgemm_gpu/triton/__init__.py' 2025-05-07T20:02:12.8651604Z INFO:wheel:adding 'fbgemm_gpu/triton/common.py' 2025-05-07T20:02:13.0786566Z INFO:wheel:adding 'fbgemm_gpu/triton/quantize.py' 2025-05-07T20:02:13.0789984Z INFO:wheel:adding 'fbgemm_gpu/triton/quantize_ref.py' 2025-05-07T20:02:13.0791998Z INFO:wheel:adding 'fbgemm_gpu/triton/jagged/__init__.py' 2025-05-07T20:02:13.0799951Z INFO:wheel:adding 'fbgemm_gpu/triton/jagged/triton_jagged_tensor_ops.py' 2025-05-07T20:02:13.0802803Z INFO:wheel:adding 'fbgemm_gpu/utils/__init__.py' 2025-05-07T20:02:13.0805006Z INFO:wheel:adding 'fbgemm_gpu/utils/filestore.py' 2025-05-07T20:02:13.0806519Z INFO:wheel:adding 'fbgemm_gpu/utils/loader.py' 2025-05-07T20:02:13.0808706Z INFO:wheel:adding 'fbgemm_gpu/utils/torch_library.py' 2025-05-07T20:02:13.0811409Z INFO:wheel:adding 'fbgemm_gpu_genai_nightly-2025.5.7.dist-info/METADATA' 2025-05-07T20:02:13.0812261Z INFO:wheel:adding 'fbgemm_gpu_genai_nightly-2025.5.7.dist-info/WHEEL' 2025-05-07T20:02:13.0813056Z INFO:wheel:adding 'fbgemm_gpu_genai_nightly-2025.5.7.dist-info/top_level.txt' 2025-05-07T20:02:13.0888395Z INFO:wheel:adding 'fbgemm_gpu_genai_nightly-2025.5.7.dist-info/RECORD' 2025-05-07T20:02:13.0889117Z INFO:root:removing _skbuild/linux-x86_64-3.13/setuptools/bdist.linux-x86_64/wheel 2025-05-07T20:02:13.2185933Z ╒════════════════════════════╤════════════════════════════════════════════════╕ 2025-05-07T20:02:13.2187607Z │ │ Version │ 2025-05-07T20:02:13.2189182Z ╞════════════════════════════╪════════════════════════════════════════════════╡ 2025-05-07T20:02:13.2190732Z │ PyTorch │ 2.8.0.dev20250507+cu128 │ 2025-05-07T20:02:13.2192704Z ├────────────────────────────┼────────────────────────────────────────────────┤ 2025-05-07T20:02:13.2194354Z │ CUDA (Declared by PyTorch) │ 12.8 │ 2025-05-07T20:02:13.2195204Z ├────────────────────────────┼────────────────────────────────────────────────┤ 2025-05-07T20:02:13.2195718Z │ CUDA (Actual) │ nvcc: NVIDIA (R) Cuda compiler driver │ 2025-05-07T20:02:13.2196280Z │ │ Copyright (c) 2005-2025 NVIDIA Corporation │ 2025-05-07T20:02:13.2196770Z │ │ Built on Wed_Jan_15_19:20:09_PST_2025 │ 2025-05-07T20:02:13.2198576Z │ │ Cuda compilation tools, release 12.8, V12.8.61 │ 2025-05-07T20:02:13.2199096Z │ │ Build cuda_12.8.r12.8/compiler.35404655_0 │ 2025-05-07T20:02:13.2199624Z ╘════════════════════════════╧════════════════════════════════════════════════╛ 2025-05-07T20:02:22.9985604Z Successfully built fbgemm_gpu_genai_nightly-2025.5.7-cp313-cp313-manylinux_2_28_x86_64.whl 2025-05-07T20:02:27.3569119Z 2025-05-07T20:02:27.4770436Z ################################################################################ 2025-05-07T20:02:27.4771319Z [CHECK] BUILT LIBRARY: ./_skbuild/linux-x86_64-3.13/cmake-build/experimental/gen_ai/fbgemm_gpu_experimental_gen_ai.so 2025-05-07T20:02:27.4789858Z [CHECK] Listing out library size: 2025-05-07T20:02:27.4791702Z + du -h --block-size=1M ./_skbuild/linux-x86_64-3.13/cmake-build/experimental/gen_ai/fbgemm_gpu_experimental_gen_ai.so 2025-05-07T20:02:27.4793217Z 2025-05-07T20:02:27.4920749Z 91 ./_skbuild/linux-x86_64-3.13/cmake-build/experimental/gen_ai/fbgemm_gpu_experimental_gen_ai.so 2025-05-07T20:02:27.4931631Z 2025-05-07T20:02:27.4975171Z [CHECK] Listing out the GLIBC versions referenced by: ./_skbuild/linux-x86_64-3.13/cmake-build/experimental/gen_ai/fbgemm_gpu_experimental_gen_ai.so 2025-05-07T20:02:27.4979118Z + objdump -TC ./_skbuild/linux-x86_64-3.13/cmake-build/experimental/gen_ai/fbgemm_gpu_experimental_gen_ai.so | grep GLIBC_ | sed 's/.*GLIBC_\([.0-9]*\).*/GLIBC_\1/g' | sort -Vu | cat 2025-05-07T20:02:27.4980968Z 2025-05-07T20:02:27.5957838Z GLIBC_2.2.5 2025-05-07T20:02:27.5958188Z GLIBC_2.3 2025-05-07T20:02:27.5958466Z GLIBC_2.14 2025-05-07T20:02:27.5958620Z 2025-05-07T20:02:27.5958625Z 2025-05-07T20:02:27.5959229Z [CHECK] Listing out the GLIBCXX versions referenced by: ./_skbuild/linux-x86_64-3.13/cmake-build/experimental/gen_ai/fbgemm_gpu_experimental_gen_ai.so 2025-05-07T20:02:27.5960575Z + objdump -TC ./_skbuild/linux-x86_64-3.13/cmake-build/experimental/gen_ai/fbgemm_gpu_experimental_gen_ai.so | grep GLIBCXX_ | sed 's/.*GLIBCXX_\([.0-9]*\).*/GLIBCXX_\1/g' | sort -Vu | cat 2025-05-07T20:02:27.5961399Z 2025-05-07T20:02:27.6087075Z GLIBCXX_3.4 2025-05-07T20:02:27.6087797Z GLIBCXX_3.4.9 2025-05-07T20:02:27.6088469Z GLIBCXX_3.4.11 2025-05-07T20:02:27.6089130Z GLIBCXX_3.4.18 2025-05-07T20:02:27.6089745Z GLIBCXX_3.4.20 2025-05-07T20:02:27.6090398Z GLIBCXX_3.4.21 2025-05-07T20:02:27.6091022Z GLIBCXX_3.4.29 2025-05-07T20:02:27.6091405Z 2025-05-07T20:02:27.6091754Z 2025-05-07T20:02:27.6672270Z + nm -gDC ./_skbuild/linux-x86_64-3.13/cmake-build/experimental/gen_ai/fbgemm_gpu_experimental_gen_ai.so > /tmp/tmp.4PuJ6KNQ1V.symbols.txt 2025-05-07T20:02:27.6674153Z 2025-05-07T20:02:27.6785017Z 2025-05-07T20:02:27.7048568Z [CHECK] Total Number of symbols: 1843 2025-05-07T20:02:27.7076231Z [CHECK] Number of fbgemm symbols: 619 2025-05-07T20:02:27.7099199Z + nm -gDCu ./_skbuild/linux-x86_64-3.13/cmake-build/experimental/gen_ai/fbgemm_gpu_experimental_gen_ai.so > /tmp/tmp.sc9yjJy69t.usymbols.txt 2025-05-07T20:02:27.7100110Z 2025-05-07T20:02:27.7127754Z 2025-05-07T20:02:27.7161120Z [CHECK] Listing out undefined symbols (252 total): 2025-05-07T20:02:27.7192531Z U VTT for std::__cxx11::basic_ostringstream, std::allocator >@GLIBCXX_3.4.21 2025-05-07T20:02:27.7194360Z U VTT for std::__cxx11::basic_stringstream, std::allocator >@GLIBCXX_3.4.21 2025-05-07T20:02:27.7195125Z U _Unwind_Resume@GCC_3.0 2025-05-07T20:02:27.7195587Z U __assert_fail@GLIBC_2.2.5 2025-05-07T20:02:27.7196025Z U __cudaPopCallConfiguration@libcudart.so.12 2025-05-07T20:02:27.7196476Z U __cudaPushCallConfiguration@libcudart.so.12 2025-05-07T20:02:27.7196949Z U __cudaRegisterFatBinary@libcudart.so.12 2025-05-07T20:02:27.7197379Z U __cudaRegisterFatBinaryEnd@libcudart.so.12 2025-05-07T20:02:27.7197843Z U __cudaRegisterFunction@libcudart.so.12 2025-05-07T20:02:27.7198381Z U __cudaRegisterVar@libcudart.so.12 2025-05-07T20:02:27.7198793Z U __cudaUnregisterFatBinary@libcudart.so.12 2025-05-07T20:02:27.7199244Z U __cxa_allocate_exception@CXXABI_1.3 2025-05-07T20:02:27.7199617Z U __cxa_atexit@GLIBC_2.2.5 2025-05-07T20:02:27.7200041Z U __cxa_begin_catch@CXXABI_1.3 2025-05-07T20:02:27.7200711Z U __cxa_end_catch@CXXABI_1.3 2025-05-07T20:02:27.7201195Z U __cxa_free_exception@CXXABI_1.3 2025-05-07T20:02:27.7201560Z U __cxa_guard_abort@CXXABI_1.3 2025-05-07T20:02:27.7201938Z U __cxa_guard_acquire@CXXABI_1.3 2025-05-07T20:02:27.7202332Z U __cxa_guard_release@CXXABI_1.3 2025-05-07T20:02:27.7202695Z U __cxa_rethrow@CXXABI_1.3 2025-05-07T20:02:27.7203090Z U __cxa_thread_atexit@CXXABI_1.3.7 2025-05-07T20:02:27.7203447Z U __cxa_throw@CXXABI_1.3 2025-05-07T20:02:27.7203828Z U __gxx_personality_v0@CXXABI_1.3 2025-05-07T20:02:27.7204187Z U __tls_get_addr@GLIBC_2.3 2025-05-07T20:02:27.7204554Z U __udivti3@GCC_3.0 2025-05-07T20:02:27.7204893Z U __xstat@GLIBC_2.2.5 2025-05-07T20:02:27.7205295Z U at::CUDAGeneratorImpl::device_type() 2025-05-07T20:02:27.7205780Z U at::CUDAGeneratorImpl::philox_cuda_state(unsigned long) 2025-05-07T20:02:27.7206220Z U at::TensorMaker::make_tensor() 2025-05-07T20:02:27.7206731Z U at::_ops::add__Tensor::call(at::Tensor&, at::Tensor const&, c10::Scalar const&) 2025-05-07T20:02:27.7207269Z U at::_ops::div__Scalar::call(at::Tensor&, c10::Scalar const&) 2025-05-07T20:02:27.7208232Z U at::_ops::empty_like::call(at::Tensor const&, std::optional, std::optional, std::optional, std::optional, std::optional) 2025-05-07T20:02:27.7209694Z U at::_ops::empty_memory_format::call(c10::ArrayRef, std::optional, std::optional, std::optional, std::optional, std::optional) 2025-05-07T20:02:27.7210797Z U at::_ops::expand::call(at::Tensor const&, c10::ArrayRef, bool) 2025-05-07T20:02:27.7211466Z U at::_ops::index_select::call(at::Tensor const&, long, at::Tensor const&) 2025-05-07T20:02:27.7212062Z U at::_ops::norm_Scalar::call(at::Tensor const&, c10::Scalar const&) 2025-05-07T20:02:27.7212649Z U at::_ops::scatter_add_::call(at::Tensor&, long, at::Tensor const&, at::Tensor const&) 2025-05-07T20:02:27.7213266Z U at::_ops::select_int::call(at::Tensor const&, long, c10::SymInt) 2025-05-07T20:02:27.7213876Z U at::_ops::split_sizes::call(at::Tensor const&, c10::ArrayRef, long) 2025-05-07T20:02:27.7214639Z U at::_ops::sum_dim_IntList::call(at::Tensor const&, c10::OptionalArrayRef, bool, std::optional) 2025-05-07T20:02:27.7215522Z U at::_ops::to_dtype::call(at::Tensor const&, c10::ScalarType, bool, bool, std::optional) 2025-05-07T20:02:27.7216731Z U at::_ops::to_dtype_layout::call(at::Tensor const&, std::optional, std::optional, std::optional, std::optional, bool, bool, std::optional) 2025-05-07T20:02:27.7218906Z U at::_ops::unsqueeze::call(at::Tensor const&, long) 2025-05-07T20:02:27.7219510Z U at::_ops::view::call(at::Tensor const&, c10::ArrayRef) 2025-05-07T20:02:27.7220552Z U at::_ops::zeros::call(c10::ArrayRef, std::optional, std::optional, std::optional, std::optional) 2025-05-07T20:02:27.7221366Z U at::cuda::detail::getDefaultCUDAGenerator(signed char) 2025-05-07T20:02:27.7221941Z U at::cuda::getCurrentDeviceProperties() 2025-05-07T20:02:27.7222400Z U at::tensor(c10::ArrayRef, c10::TensorOptions const&) 2025-05-07T20:02:27.7223014Z U c10::AutogradMetaInterface::~AutogradMetaInterface() 2025-05-07T20:02:27.7223571Z U c10::BFloat16* at::TensorBase::data_ptr() const 2025-05-07T20:02:27.7224113Z U c10::BFloat16* at::TensorBase::mutable_data_ptr() const 2025-05-07T20:02:27.7224610Z U c10::BoolType::get() 2025-05-07T20:02:27.7225236Z U c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) 2025-05-07T20:02:27.7225900Z U c10::Error::what() const 2025-05-07T20:02:27.7226417Z U c10::Float8_e4m3fn* at::TensorBase::mutable_data_ptr() const 2025-05-07T20:02:27.7226919Z U c10::FloatType::get() 2025-05-07T20:02:27.7227361Z U c10::GeneratorImpl::device() const 2025-05-07T20:02:27.7227775Z U c10::IValue::isTensorList() const 2025-05-07T20:02:27.7228181Z U c10::IValue::reportToTensorTypeError() const 2025-05-07T20:02:27.7228603Z U c10::IntType::get() 2025-05-07T20:02:27.7229376Z U c10::ListType::get(std::__cxx11::basic_string, std::allocator > const&, c10::Type::SingletonOrSharedTypePtr) 2025-05-07T20:02:27.7230189Z U c10::MessageLogger::MessageLogger(char const*, int, int) 2025-05-07T20:02:27.7230674Z U c10::MessageLogger::~MessageLogger() 2025-05-07T20:02:27.7231162Z U c10::OptionalType::get(c10::Type::SingletonOrSharedTypePtr) 2025-05-07T20:02:27.7231686Z U c10::ScalarTypeType::get() 2025-05-07T20:02:27.7232133Z U c10::StorageImpl::throw_data_ptr_access_error() const 2025-05-07T20:02:27.7232550Z U c10::StringType::get() 2025-05-07T20:02:27.7232968Z U c10::SymBool::guard_bool(char const*, long) const 2025-05-07T20:02:27.7233408Z U c10::SymFloat::guard_float(char const*, long) const 2025-05-07T20:02:27.7234202Z U c10::SymInt::SymInt(c10::intrusive_ptr >) 2025-05-07T20:02:27.7234954Z U c10::SymInt::guard_int(char const*, long) const 2025-05-07T20:02:27.7235358Z U c10::SymInt::toSymNode() const 2025-05-07T20:02:27.7235928Z U c10::SymbolicShapeMeta::init_is_contiguous() const 2025-05-07T20:02:27.7236677Z U c10::TensorImpl::set_autograd_meta(std::unique_ptr >) 2025-05-07T20:02:27.7237453Z U c10::TensorImpl::throw_data_ptr_access_error() const 2025-05-07T20:02:27.7237894Z U c10::TensorType::get() 2025-05-07T20:02:27.7238255Z U c10::UndefinedTensorImpl::_singleton 2025-05-07T20:02:27.7239325Z U c10::Warning::Warning(std::variant, c10::SourceLocation const&, std::__cxx11::basic_string, std::allocator >, bool) 2025-05-07T20:02:27.7240330Z U c10::cuda::CUDACachingAllocator::allocator 2025-05-07T20:02:27.7240802Z U c10::cuda::CUDAStream::stream() const 2025-05-07T20:02:27.7241409Z U c10::cuda::ExchangeDevice(signed char) 2025-05-07T20:02:27.7241805Z U c10::cuda::GetDevice(signed char*) 2025-05-07T20:02:27.7242256Z U c10::cuda::MaybeSetDevice(signed char) 2025-05-07T20:02:27.7242639Z U c10::cuda::SetDevice(signed char) 2025-05-07T20:02:27.7243202Z U c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) 2025-05-07T20:02:27.7243798Z U c10::cuda::current_device() 2025-05-07T20:02:27.7244163Z U c10::cuda::device_count() 2025-05-07T20:02:27.7244596Z U c10::cuda::getCurrentCUDAStream(signed char) 2025-05-07T20:02:27.7245101Z U c10::cuda::getDefaultCUDAStream(signed char) 2025-05-07T20:02:27.7245584Z U c10::cuda::getStreamFromPool(bool, signed char) 2025-05-07T20:02:27.7246032Z U c10::cuda::getStreamFromPool(int, signed char) 2025-05-07T20:02:27.7246532Z U c10::cuda::setCurrentCUDAStream(c10::cuda::CUDAStream) 2025-05-07T20:02:27.7247009Z U c10::cuda::warn_or_error_on_sync() 2025-05-07T20:02:27.7247728Z U c10::detail::ListImpl::ListImpl(std::vector >, c10::Type::SingletonOrSharedTypePtr) 2025-05-07T20:02:27.7248904Z U c10::detail::infer_schema::make_function_schema(c10::ArrayRef, c10::ArrayRef) 2025-05-07T20:02:27.7249899Z U c10::detail::torchCheckFail(char const*, char const*, unsigned int, char const*) 2025-05-07T20:02:27.7250836Z U c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string, std::allocator > const&) 2025-05-07T20:02:27.7251893Z U c10::detail::torchInternalAssertFail(char const*, char const*, unsigned int, char const*, char const*) 2025-05-07T20:02:27.7253043Z U c10::detail::torchInternalAssertFail(char const*, char const*, unsigned int, char const*, std::__cxx11::basic_string, std::allocator > const&) 2025-05-07T20:02:27.7253915Z U c10::get_default_dtype() 2025-05-07T20:02:27.7254491Z U c10::impl::ExcludeDispatchKeyGuard::ExcludeDispatchKeyGuard(c10::DispatchKeySet) 2025-05-07T20:02:27.7255139Z U c10::impl::ExcludeDispatchKeyGuard::~ExcludeDispatchKeyGuard() 2025-05-07T20:02:27.7255660Z U c10::impl::GPUTrace::gpuTraceState 2025-05-07T20:02:27.7256084Z U c10::impl::GPUTrace::haveState 2025-05-07T20:02:27.7256505Z U c10::impl::cow::is_cow_data_ptr(c10::DataPtr const&) 2025-05-07T20:02:27.7257057Z U c10::impl::cow::materialize_cow_storage(c10::StorageImpl&) 2025-05-07T20:02:27.7257513Z U c10::impl::device_guard_impl_registry 2025-05-07T20:02:27.7257932Z U c10::operator*(c10::SymInt const&, int) 2025-05-07T20:02:27.7258322Z U c10::operator-(c10::SymInt const&, int) 2025-05-07T20:02:27.7258742Z U c10::operator-(c10::SymInt const&, long) 2025-05-07T20:02:27.7259185Z U c10::operator<<(std::ostream&, c10::Device const&) 2025-05-07T20:02:27.7259712Z U c10::operator<<(std::ostream&, c10::DeviceType) 2025-05-07T20:02:27.7260156Z U c10::throwNullDataPtrError() 2025-05-07T20:02:27.7260519Z U c10::warn(c10::Warning const&) 2025-05-07T20:02:27.7260917Z U c10::warnDeprecatedDataPtr() 2025-05-07T20:02:27.7261740Z U c10d::getNcclErrorDetailStr(ncclResult_t, std::optional, std::allocator > >) 2025-05-07T20:02:27.7262545Z U c10d::ncclGetErrorWithVersion[abi:cxx11](ncclResult_t) 2025-05-07T20:02:27.7263123Z U caffe2::TypeMeta::error_unsupported_typemeta(caffe2::TypeMeta) 2025-05-07T20:02:27.7263571Z U caffe2::TypeMeta::typeMetaDatas() 2025-05-07T20:02:27.7263924Z U cublasLtCreate 2025-05-07T20:02:27.7264233Z U cublasLtMatmul 2025-05-07T20:02:27.7264548Z U cublasLtMatmulAlgoGetHeuristic 2025-05-07T20:02:27.7264919Z U cublasLtMatmulDescCreate 2025-05-07T20:02:27.7265330Z U cublasLtMatmulDescSetAttribute 2025-05-07T20:02:27.7265707Z U cublasLtMatmulPreferenceCreate 2025-05-07T20:02:27.7266086Z U cublasLtMatmulPreferenceSetAttribute 2025-05-07T20:02:27.7266484Z U cublasLtMatrixLayoutCreate 2025-05-07T20:02:27.7266879Z U cudaDeviceGetAttribute@libcudart.so.12 2025-05-07T20:02:27.7267331Z U cudaDeviceSynchronize@libcudart.so.12 2025-05-07T20:02:27.7267788Z U cudaEventCreateWithFlags@libcudart.so.12 2025-05-07T20:02:27.7268196Z U cudaEventDestroy@libcudart.so.12 2025-05-07T20:02:27.7268615Z U cudaEventElapsedTime@libcudart.so.12 2025-05-07T20:02:27.7269002Z U cudaEventQuery@libcudart.so.12 2025-05-07T20:02:27.7269405Z U cudaEventRecord@libcudart.so.12 2025-05-07T20:02:27.7269794Z U cudaEventSynchronize@libcudart.so.12 2025-05-07T20:02:27.7270179Z U cudaFree@libcudart.so.12 2025-05-07T20:02:27.7270528Z U cudaFuncSetAttribute@libcudart.so.12 2025-05-07T20:02:27.7270903Z U cudaGetDevice@libcudart.so.12 2025-05-07T20:02:27.7271299Z U cudaGetDeviceProperties_v2@libcudart.so.12 2025-05-07T20:02:27.7271703Z U cudaGetDriverEntryPoint@libcudart.so.12 2025-05-07T20:02:27.7272206Z U cudaGetErrorName@libcudart.so.12 2025-05-07T20:02:27.7272546Z U cudaGetErrorString@libcudart.so.12 2025-05-07T20:02:27.7272900Z U cudaGetLastError@libcudart.so.12 2025-05-07T20:02:27.7273238Z U cudaIpcGetMemHandle@libcudart.so.12 2025-05-07T20:02:27.7273604Z U cudaIpcOpenMemHandle@libcudart.so.12 2025-05-07T20:02:27.7273989Z U cudaLaunchCooperativeKernel@libcudart.so.12 2025-05-07T20:02:27.7274350Z U cudaLaunchKernel@libcudart.so.12 2025-05-07T20:02:27.7274860Z U cudaLaunchKernelExC@libcudart.so.12 2025-05-07T20:02:27.7275205Z U cudaMalloc@libcudart.so.12 2025-05-07T20:02:27.7275559Z U cudaMemcpy@libcudart.so.12 2025-05-07T20:02:27.7275885Z U cudaMemcpyAsync@libcudart.so.12 2025-05-07T20:02:27.7276246Z U cudaMemsetAsync@libcudart.so.12 2025-05-07T20:02:27.7276581Z U cudaStreamQuery@libcudart.so.12 2025-05-07T20:02:27.7276980Z U cudaStreamSynchronize@libcudart.so.12 2025-05-07T20:02:27.7277350Z U cudaStreamWaitEvent@libcudart.so.12 2025-05-07T20:02:27.7277661Z U exit@GLIBC_2.2.5 2025-05-07T20:02:27.7277953Z U fclose@GLIBC_2.2.5 2025-05-07T20:02:27.7278230Z U fflush@GLIBC_2.2.5 2025-05-07T20:02:27.7278577Z U float* at::TensorBase::data_ptr() const 2025-05-07T20:02:27.7278972Z U float* at::TensorBase::mutable_data_ptr() const 2025-05-07T20:02:27.7279343Z U fopen@GLIBC_2.2.5 2025-05-07T20:02:27.7279637Z U fprintf@GLIBC_2.2.5 2025-05-07T20:02:27.7279915Z U fread@GLIBC_2.2.5 2025-05-07T20:02:27.7280205Z U fwrite@GLIBC_2.2.5 2025-05-07T20:02:27.7280513Z U int* at::TensorBase::data_ptr() const 2025-05-07T20:02:27.7280933Z U int* at::TensorBase::mutable_data_ptr() const 2025-05-07T20:02:27.7281355Z U long c10::detail::maybe_wrap_dim_slow(long, long, bool) 2025-05-07T20:02:27.7281825Z U long* at::TensorBase::data_ptr() const 2025-05-07T20:02:27.7282175Z U memcmp@GLIBC_2.2.5 2025-05-07T20:02:27.7282454Z U memcpy@GLIBC_2.14 2025-05-07T20:02:27.7282754Z U memmove@GLIBC_2.2.5 2025-05-07T20:02:27.7283038Z U memset@GLIBC_2.2.5 2025-05-07T20:02:27.7283339Z U ncclAllGather 2025-05-07T20:02:27.7283603Z U ncclAllReduce 2025-05-07T20:02:27.7283918Z U ncclCommInitRank 2025-05-07T20:02:27.7284210Z U ncclGetUniqueId 2025-05-07T20:02:27.7284537Z U ncclReduceScatter 2025-05-07T20:02:27.7284888Z U operator delete(void*, unsigned long)@CXXABI_1.3.9 2025-05-07T20:02:27.7285322Z U operator new(unsigned long)@GLIBCXX_3.4 2025-05-07T20:02:27.7285711Z U printf@GLIBC_2.2.5 2025-05-07T20:02:27.7286089Z U signed char* at::TensorBase::data_ptr() const 2025-05-07T20:02:27.7286613Z U signed char* at::TensorBase::mutable_data_ptr() const 2025-05-07T20:02:27.7287283Z U std::__cxx11::basic_ostringstream, std::allocator >::basic_ostringstream() 2025-05-07T20:02:27.7288159Z U std::__cxx11::basic_ostringstream, std::allocator >::~basic_ostringstream()@GLIBCXX_3.4.21 2025-05-07T20:02:27.7289033Z U std::__cxx11::basic_stringbuf, std::allocator >::str() const &@GLIBCXX_3.4.29 2025-05-07T20:02:27.7289859Z U std::__detail::_Prime_rehash_policy::_M_need_rehash(unsigned long, unsigned long, unsigned long) const@GLIBCXX_3.4.18 2025-05-07T20:02:27.7290511Z U std::__throw_bad_alloc()@GLIBCXX_3.4 2025-05-07T20:02:27.7290912Z U std::__throw_bad_array_new_length() 2025-05-07T20:02:27.7291269Z U std::__throw_bad_cast()@GLIBCXX_3.4 2025-05-07T20:02:27.7291702Z U std::__throw_length_error(char const*)@GLIBCXX_3.4 2025-05-07T20:02:27.7292120Z U std::__throw_logic_error(char const*)@GLIBCXX_3.4 2025-05-07T20:02:27.7292578Z U std::__throw_out_of_range_fmt(char const*, ...)@GLIBCXX_3.4.20 2025-05-07T20:02:27.7293022Z U std::__throw_system_error(int)@GLIBCXX_3.4.11 2025-05-07T20:02:27.7293557Z U std::basic_ios >::clear(std::_Ios_Iostate)@GLIBCXX_3.4 2025-05-07T20:02:27.7294283Z U std::basic_ios >::init(std::basic_streambuf >*)@GLIBCXX_3.4 2025-05-07T20:02:27.7309335Z U std::basic_ostream >& std::__ostream_insert >(std::basic_ostream >&, char const*, long)@GLIBCXX_3.4.9 2025-05-07T20:02:27.7310731Z U std::basic_ostream >& std::operator<< >(std::basic_ostream >&, char const*)@GLIBCXX_3.4 2025-05-07T20:02:27.7311493Z U std::cerr@GLIBCXX_3.4 2025-05-07T20:02:27.7311811Z U std::cout@GLIBCXX_3.4 2025-05-07T20:02:27.7312184Z U std::ctype::_M_widen_init() const@GLIBCXX_3.4.11 2025-05-07T20:02:27.7312780Z U std::exception::what() const@GLIBCXX_3.4 2025-05-07T20:02:27.7313204Z U std::exception::~exception()@GLIBCXX_3.4 2025-05-07T20:02:27.7313586Z U std::ios_base::Init::Init()@GLIBCXX_3.4 2025-05-07T20:02:27.7313984Z U std::ios_base::Init::~Init()@GLIBCXX_3.4 2025-05-07T20:02:27.7314353Z U std::ios_base::ios_base()@GLIBCXX_3.4 2025-05-07T20:02:27.7314811Z U std::ios_base::~ios_base()@GLIBCXX_3.4 2025-05-07T20:02:27.7315172Z U std::locale::locale()@GLIBCXX_3.4 2025-05-07T20:02:27.7315610Z U std::locale::~locale()@GLIBCXX_3.4 2025-05-07T20:02:27.7316043Z U std::logic_error::logic_error(char const*)@GLIBCXX_3.4.21 2025-05-07T20:02:27.7316468Z U std::logic_error::~logic_error()@GLIBCXX_3.4 2025-05-07T20:02:27.7316938Z U std::ostream& std::ostream::_M_insert(long)@GLIBCXX_3.4.9 2025-05-07T20:02:27.7317486Z U std::ostream& std::ostream::_M_insert(unsigned long)@GLIBCXX_3.4.9 2025-05-07T20:02:27.7318167Z U std::ostream& std::ostream::_M_insert(void const*)@GLIBCXX_3.4.9 2025-05-07T20:02:27.7318658Z U std::ostream::flush()@GLIBCXX_3.4 2025-05-07T20:02:27.7319022Z U std::ostream::operator<<(int)@GLIBCXX_3.4 2025-05-07T20:02:27.7319417Z U std::ostream::put(char)@GLIBCXX_3.4 2025-05-07T20:02:27.7319833Z U std::runtime_error::runtime_error(char const*)@GLIBCXX_3.4.21 2025-05-07T20:02:27.7320586Z U std::runtime_error::runtime_error(std::__cxx11::basic_string, std::allocator > const&)@GLIBCXX_3.4.21 2025-05-07T20:02:27.7321301Z U std::runtime_error::~runtime_error()@GLIBCXX_3.4 2025-05-07T20:02:27.7321678Z U std::terminate()@GLIBCXX_3.4 2025-05-07T20:02:27.7322037Z U stderr@GLIBC_2.2.5 2025-05-07T20:02:27.7322342Z U strlen@GLIBC_2.2.5 2025-05-07T20:02:27.7322700Z U torch::CppFunction::~CppFunction() 2025-05-07T20:02:27.7323516Z U torch::Library::Library(torch::Library::Kind, std::__cxx11::basic_string, std::allocator >, std::optional, char const*, unsigned int) 2025-05-07T20:02:27.7324689Z U torch::Library::_def(c10::FunctionSchema&&, c10::OperatorName*, std::vector > const&, torch::_RegisterOrVerify) & 2025-05-07T20:02:27.7325552Z U torch::Library::_impl(char const*, torch::CppFunction&&, torch::_RegisterOrVerify) & 2025-05-07T20:02:27.7326424Z U torch::cuda::nccl::all2all(std::vector >&, std::vector >&, void*, c10::cuda::CUDAStream&) 2025-05-07T20:02:27.7327395Z U torch::cuda::nccl::all2all_single_equal_split(at::Tensor&, at::Tensor&, int, void*, c10::cuda::CUDAStream&) 2025-05-07T20:02:27.7328225Z U torch::jit::parseSchema(std::__cxx11::basic_string, std::allocator > const&, bool) 2025-05-07T20:02:27.7328814Z U typeinfo for c10::Error 2025-05-07T20:02:27.7329207Z U typeinfo for std::exception@GLIBCXX_3.4 2025-05-07T20:02:27.7329617Z U typeinfo for std::logic_error@GLIBCXX_3.4 2025-05-07T20:02:27.7330003Z U typeinfo for std::runtime_error@GLIBCXX_3.4 2025-05-07T20:02:27.7330529Z U unsigned char* at::TensorBase::mutable_data_ptr() const 2025-05-07T20:02:27.7330978Z U usleep@GLIBC_2.2.5 2025-05-07T20:02:27.7331372Z U vtable for __cxxabiv1::__class_type_info@CXXABI_1.3 2025-05-07T20:02:27.7331814Z U vtable for __cxxabiv1::__function_type_info@CXXABI_1.3 2025-05-07T20:02:27.7332291Z U vtable for __cxxabiv1::__si_class_type_info@CXXABI_1.3 2025-05-07T20:02:27.7332706Z U vtable for c10::Error 2025-05-07T20:02:27.7333266Z U vtable for std::__cxx11::basic_ostringstream, std::allocator >@GLIBCXX_3.4.21 2025-05-07T20:02:27.7334087Z U vtable for std::__cxx11::basic_stringbuf, std::allocator >@GLIBCXX_3.4.21 2025-05-07T20:02:27.7334935Z U vtable for std::__cxx11::basic_stringstream, std::allocator >@GLIBCXX_3.4.21 2025-05-07T20:02:27.7335622Z U vtable for std::basic_ios >@GLIBCXX_3.4 2025-05-07T20:02:27.7336201Z U vtable for std::basic_streambuf >@GLIBCXX_3.4 2025-05-07T20:02:27.7336677Z U vtable for torch::autograd::AutogradMeta 2025-05-07T20:02:27.7337069Z w _ITM_deregisterTMCloneTable 2025-05-07T20:02:27.7337411Z w _ITM_registerTMCloneTable 2025-05-07T20:02:27.7337773Z w __cxa_finalize@GLIBC_2.2.5 2025-05-07T20:02:27.7338162Z w __gmon_start__ 2025-05-07T20:02:27.7338451Z w __pthread_key_create 2025-05-07T20:02:27.7338803Z w pthread_mutex_lock@GLIBC_2.2.5 2025-05-07T20:02:27.7339146Z w pthread_mutex_unlock@GLIBC_2.2.5 2025-05-07T20:02:27.7339678Z [CHECK] Listing out external shared libraries linked: 2025-05-07T20:02:27.7340576Z + ldd ./_skbuild/linux-x86_64-3.13/cmake-build/experimental/gen_ai/fbgemm_gpu_experimental_gen_ai.so 2025-05-07T20:02:27.7341072Z 2025-05-07T20:02:27.7341227Z linux-vdso.so.1 (0x00007ffca2ec4000) 2025-05-07T20:02:27.7341606Z libtorch.so => not found 2025-05-07T20:02:27.7341894Z libc10.so => not found 2025-05-07T20:02:27.7342211Z libc10_cuda.so => not found 2025-05-07T20:02:27.7342507Z libnccl.so.2 => not found 2025-05-07T20:02:27.7342835Z libtorch_cpu.so => not found 2025-05-07T20:02:27.7343140Z libtorch_cuda.so => not found 2025-05-07T20:02:27.7343475Z libcudart.so.12 => not found 2025-05-07T20:02:27.7343849Z libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007f449cb9c000) 2025-05-07T20:02:27.7344332Z libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f44a2b9c000) 2025-05-07T20:02:27.7344783Z libc.so.6 => /lib64/libc.so.6 (0x00007f449c994000) 2025-05-07T20:02:27.7345181Z /lib64/ld-linux-x86-64.so.2 (0x00007f44a2bd0000) 2025-05-07T20:02:27.7345607Z libm.so.6 => /lib64/libm.so.6 (0x00007f44a2ac1000) 2025-05-07T20:02:27.7345860Z 2025-05-07T20:02:27.7345994Z [CHECK] Displaying ELF information: 2025-05-07T20:02:27.7346619Z + readelf -d ./_skbuild/linux-x86_64-3.13/cmake-build/experimental/gen_ai/fbgemm_gpu_experimental_gen_ai.so 2025-05-07T20:02:27.7347109Z 2025-05-07T20:02:27.7420968Z 2025-05-07T20:02:27.7422217Z Dynamic section at offset 0x5a661e0 contains 38 entries: 2025-05-07T20:02:27.7423442Z Tag Type Name/Value 2025-05-07T20:02:27.7424756Z 0x0000000000000001 (NEEDED) Shared library: [libtorch.so] 2025-05-07T20:02:27.7426303Z 0x0000000000000001 (NEEDED) Shared library: [libc10.so] 2025-05-07T20:02:27.7427939Z 0x0000000000000001 (NEEDED) Shared library: [libc10_cuda.so] 2025-05-07T20:02:27.7428517Z 0x0000000000000001 (NEEDED) Shared library: [libnccl.so.2] 2025-05-07T20:02:27.7429075Z 0x0000000000000001 (NEEDED) Shared library: [libtorch_cpu.so] 2025-05-07T20:02:27.7429678Z 0x0000000000000001 (NEEDED) Shared library: [libtorch_cuda.so] 2025-05-07T20:02:27.7430486Z 0x0000000000000001 (NEEDED) Shared library: [libcudart.so.12] 2025-05-07T20:02:27.7431073Z 0x0000000000000001 (NEEDED) Shared library: [libstdc++.so.6] 2025-05-07T20:02:27.7431662Z 0x0000000000000001 (NEEDED) Shared library: [libgcc_s.so.1] 2025-05-07T20:02:27.7432209Z 0x0000000000000001 (NEEDED) Shared library: [libc.so.6] 2025-05-07T20:02:27.7432799Z 0x0000000000000001 (NEEDED) Shared library: [ld-linux-x86-64.so.2] 2025-05-07T20:02:27.7433425Z 0x000000000000000e (SONAME) Library soname: [fbgemm_gpu_experimental_gen_ai.so] 2025-05-07T20:02:27.7433983Z 0x000000000000000c (INIT) 0x59000 2025-05-07T20:02:27.7434432Z 0x000000000000000d (FINI) 0x4a1fac 2025-05-07T20:02:27.7434815Z 0x0000000000000019 (INIT_ARRAY) 0x5a658a0 2025-05-07T20:02:27.7435317Z 0x000000000000001b (INIT_ARRAYSZ) 1136 (bytes) 2025-05-07T20:02:27.7435727Z 0x000000000000001a (FINI_ARRAY) 0x5a65d10 2025-05-07T20:02:27.7436157Z 0x000000000000001c (FINI_ARRAYSZ) 8 (bytes) 2025-05-07T20:02:27.7436588Z 0x0000000000000004 (HASH) 0x238 2025-05-07T20:02:27.7436990Z 0x000000006ffffef5 (GNU_HASH) 0x2ef0 2025-05-07T20:02:27.7437366Z 0x0000000000000005 (STRTAB) 0x10c88 2025-05-07T20:02:27.7437774Z 0x0000000000000006 (SYMTAB) 0x5fa8 2025-05-07T20:02:27.7438207Z 0x000000000000000a (STRSZ) 229126 (bytes) 2025-05-07T20:02:27.7438607Z 0x000000000000000b (SYMENT) 24 (bytes) 2025-05-07T20:02:27.7439077Z 0x0000000000000003 (PLTGOT) 0x5a67490 2025-05-07T20:02:27.7439486Z 0x0000000000000002 (PLTRELSZ) 19224 (bytes) 2025-05-07T20:02:27.7440026Z 0x0000000000000014 (PLTREL) RELA 2025-05-07T20:02:27.7440381Z 0x0000000000000017 (JMPREL) 0x53750 2025-05-07T20:02:27.7440772Z 0x0000000000000007 (RELA) 0x49b38 2025-05-07T20:02:27.7441144Z 0x0000000000000008 (RELASZ) 39960 (bytes) 2025-05-07T20:02:27.7441568Z 0x0000000000000009 (RELAENT) 24 (bytes) 2025-05-07T20:02:27.7441964Z 0x0000000000000018 (BIND_NOW) 2025-05-07T20:02:27.7442319Z 0x000000006ffffffb (FLAGS_1) Flags: NOW 2025-05-07T20:02:27.7442733Z 0x000000006ffffffe (VERNEED) 0x499f8 2025-05-07T20:02:27.7443093Z 0x000000006fffffff (VERNEEDNUM) 5 2025-05-07T20:02:27.7443476Z 0x000000006ffffff0 (VERSYM) 0x48b8e 2025-05-07T20:02:27.7443834Z 0x000000006ffffff9 (RELACOUNT) 215 2025-05-07T20:02:27.7444206Z 0x0000000000000000 (NULL) 0x0 2025-05-07T20:02:27.7444443Z 2025-05-07T20:02:27.7444607Z ################################################################################ 2025-05-07T20:02:27.7444850Z 2025-05-07T20:02:27.7444853Z 2025-05-07T20:02:27.7444980Z ################################################################################ 2025-05-07T20:02:27.7445698Z [CHECK] BUILT LIBRARY: ./_skbuild/linux-x86_64-3.13/cmake-build/experimental/example/fbgemm_gpu_experimental_example_py.so 2025-05-07T20:02:27.7446373Z [CHECK] Listing out library size: 2025-05-07T20:02:27.7447137Z + du -h --block-size=1M ./_skbuild/linux-x86_64-3.13/cmake-build/experimental/example/fbgemm_gpu_experimental_example_py.so 2025-05-07T20:02:27.7447831Z 2025-05-07T20:02:27.7451699Z 1 ./_skbuild/linux-x86_64-3.13/cmake-build/experimental/example/fbgemm_gpu_experimental_example_py.so 2025-05-07T20:02:27.7452178Z 2025-05-07T20:02:27.7452748Z [CHECK] Listing out the GLIBC versions referenced by: ./_skbuild/linux-x86_64-3.13/cmake-build/experimental/example/fbgemm_gpu_experimental_example_py.so 2025-05-07T20:02:27.7454146Z + objdump -TC ./_skbuild/linux-x86_64-3.13/cmake-build/experimental/example/fbgemm_gpu_experimental_example_py.so | grep GLIBC_ | sed 's/.*GLIBC_\([.0-9]*\).*/GLIBC_\1/g' | sort -Vu | cat 2025-05-07T20:02:27.7454932Z 2025-05-07T20:02:27.7515519Z GLIBC_2.2.5 2025-05-07T20:02:27.7516148Z GLIBC_2.14 2025-05-07T20:02:27.7518192Z 2025-05-07T20:02:27.7518360Z 2025-05-07T20:02:27.7518988Z [CHECK] Listing out the GLIBCXX versions referenced by: ./_skbuild/linux-x86_64-3.13/cmake-build/experimental/example/fbgemm_gpu_experimental_example_py.so 2025-05-07T20:02:27.7520435Z + objdump -TC ./_skbuild/linux-x86_64-3.13/cmake-build/experimental/example/fbgemm_gpu_experimental_example_py.so | grep GLIBCXX_ | sed 's/.*GLIBCXX_\([.0-9]*\).*/GLIBCXX_\1/g' | sort -Vu | cat 2025-05-07T20:02:27.7521256Z 2025-05-07T20:02:27.7575498Z GLIBCXX_3.4 2025-05-07T20:02:27.7575830Z GLIBCXX_3.4.9 2025-05-07T20:02:27.7576108Z GLIBCXX_3.4.21 2025-05-07T20:02:27.7577402Z 2025-05-07T20:02:27.7577407Z 2025-05-07T20:02:27.7603197Z + nm -gDC ./_skbuild/linux-x86_64-3.13/cmake-build/experimental/example/fbgemm_gpu_experimental_example_py.so > /tmp/tmp.potFh7beOl.symbols.txt 2025-05-07T20:02:27.7603881Z 2025-05-07T20:02:27.7621925Z 2025-05-07T20:02:27.7655438Z [CHECK] Total Number of symbols: 155 2025-05-07T20:02:27.7669438Z [CHECK] Number of fbgemm symbols: 19 2025-05-07T20:02:27.7692566Z + nm -gDCu ./_skbuild/linux-x86_64-3.13/cmake-build/experimental/example/fbgemm_gpu_experimental_example_py.so > /tmp/tmp.6r2SQw7tBn.usymbols.txt 2025-05-07T20:02:27.7693589Z 2025-05-07T20:02:27.7718510Z 2025-05-07T20:02:27.7757601Z [CHECK] Listing out undefined symbols (76 total): 2025-05-07T20:02:27.7777685Z U VTT for std::__cxx11::basic_ostringstream, std::allocator >@GLIBCXX_3.4.21 2025-05-07T20:02:27.7779739Z U _Unwind_Resume@GCC_3.0 2025-05-07T20:02:27.7780820Z U __cudaPopCallConfiguration@libcudart.so.12 2025-05-07T20:02:27.7781750Z U __cudaPushCallConfiguration@libcudart.so.12 2025-05-07T20:02:27.7782172Z U __cudaRegisterFatBinary@libcudart.so.12 2025-05-07T20:02:27.7782571Z U __cudaRegisterFatBinaryEnd@libcudart.so.12 2025-05-07T20:02:27.7782998Z U __cudaRegisterFunction@libcudart.so.12 2025-05-07T20:02:27.7783372Z U __cudaRegisterVar@libcudart.so.12 2025-05-07T20:02:27.7783768Z U __cudaUnregisterFatBinary@libcudart.so.12 2025-05-07T20:02:27.7784159Z U __cxa_allocate_exception@CXXABI_1.3 2025-05-07T20:02:27.7784498Z U __cxa_atexit@GLIBC_2.2.5 2025-05-07T20:02:27.7784843Z U __cxa_free_exception@CXXABI_1.3 2025-05-07T20:02:27.7785168Z U __cxa_throw@CXXABI_1.3 2025-05-07T20:02:27.7785504Z U __gxx_personality_v0@CXXABI_1.3 2025-05-07T20:02:27.7786101Z U at::_ops::add_Tensor::call(at::Tensor const&, at::Tensor const&, c10::Scalar const&) 2025-05-07T20:02:27.7786756Z U at::_ops::to_dtype::call(at::Tensor const&, c10::ScalarType, bool, bool, std::optional) 2025-05-07T20:02:27.7787643Z U at::_ops::zeros::call(c10::ArrayRef, std::optional, std::optional, std::optional, std::optional) 2025-05-07T20:02:27.7788302Z U c10::FloatType::get() 2025-05-07T20:02:27.7788653Z U c10::IValue::reportToTensorTypeError() const 2025-05-07T20:02:27.7789075Z U c10::MessageLogger::MessageLogger(char const*, int, int) 2025-05-07T20:02:27.7789459Z U c10::MessageLogger::~MessageLogger() 2025-05-07T20:02:27.7789836Z U c10::SymFloat::guard_float(char const*, long) const 2025-05-07T20:02:27.7790184Z U c10::TensorType::get() 2025-05-07T20:02:27.7790513Z U c10::UndefinedTensorImpl::_singleton 2025-05-07T20:02:27.7791234Z U c10::detail::infer_schema::make_function_schema(c10::ArrayRef, c10::ArrayRef) 2025-05-07T20:02:27.7792091Z U c10::detail::torchCheckFail(char const*, char const*, unsigned int, char const*) 2025-05-07T20:02:27.7792876Z U c10::detail::torchInternalAssertFail(char const*, char const*, unsigned int, char const*, char const*) 2025-05-07T20:02:27.7793509Z U caffe2::TypeMeta::error_unsupported_typemeta(caffe2::TypeMeta) 2025-05-07T20:02:27.7793942Z U cudaGetErrorString@libcudart.so.12 2025-05-07T20:02:27.7794297Z U cudaGetLastError@libcudart.so.12 2025-05-07T20:02:27.7794630Z U cudaLaunchKernel@libcudart.so.12 2025-05-07T20:02:27.7795001Z U float* at::TensorBase::data_ptr() const 2025-05-07T20:02:27.7795410Z U long c10::detail::maybe_wrap_dim_slow(long, long, bool) 2025-05-07T20:02:27.7795793Z U memcpy@GLIBC_2.14 2025-05-07T20:02:27.7796072Z U memmove@GLIBC_2.2.5 2025-05-07T20:02:27.7796368Z U memset@GLIBC_2.2.5 2025-05-07T20:02:27.7796643Z U ncclCommDestroy 2025-05-07T20:02:27.7797004Z U ncclCommInitAll 2025-05-07T20:02:27.7797345Z U operator delete(void*, unsigned long)@CXXABI_1.3.9 2025-05-07T20:02:27.7797764Z U operator new(unsigned long)@GLIBCXX_3.4 2025-05-07T20:02:27.7798387Z U std::__cxx11::basic_ostringstream, std::allocator >::~basic_ostringstream()@GLIBCXX_3.4.21 2025-05-07T20:02:27.7798984Z U std::__throw_bad_alloc()@GLIBCXX_3.4 2025-05-07T20:02:27.7799354Z U std::__throw_length_error(char const*)@GLIBCXX_3.4 2025-05-07T20:02:27.7799751Z U std::__throw_logic_error(char const*)@GLIBCXX_3.4 2025-05-07T20:02:27.7800430Z U std::basic_ios >::clear(std::_Ios_Iostate)@GLIBCXX_3.4 2025-05-07T20:02:27.7801415Z U std::basic_ios >::init(std::basic_streambuf >*)@GLIBCXX_3.4 2025-05-07T20:02:27.7802517Z U std::basic_ostream >& std::__ostream_insert >(std::basic_ostream >&, char const*, long)@GLIBCXX_3.4.9 2025-05-07T20:02:27.7803384Z U std::ios_base::Init::Init()@GLIBCXX_3.4 2025-05-07T20:02:27.7803755Z U std::ios_base::Init::~Init()@GLIBCXX_3.4 2025-05-07T20:02:27.7804143Z U std::ios_base::ios_base()@GLIBCXX_3.4 2025-05-07T20:02:27.7804507Z U std::ios_base::~ios_base()@GLIBCXX_3.4 2025-05-07T20:02:27.7804881Z U std::locale::locale()@GLIBCXX_3.4 2025-05-07T20:02:27.7805230Z U std::locale::~locale()@GLIBCXX_3.4 2025-05-07T20:02:27.7805670Z U std::ostream& std::ostream::_M_insert(long)@GLIBCXX_3.4.9 2025-05-07T20:02:27.7806139Z U std::ostream::operator<<(int)@GLIBCXX_3.4 2025-05-07T20:02:27.7806831Z U std::runtime_error::runtime_error(std::__cxx11::basic_string, std::allocator > const&)@GLIBCXX_3.4.21 2025-05-07T20:02:27.7807753Z U std::runtime_error::~runtime_error()@GLIBCXX_3.4 2025-05-07T20:02:27.7808099Z U strlen@GLIBC_2.2.5 2025-05-07T20:02:27.7808425Z U torch::CppFunction::~CppFunction() 2025-05-07T20:02:27.7809237Z U torch::Library::Library(torch::Library::Kind, std::__cxx11::basic_string, std::allocator >, std::optional, char const*, unsigned int) 2025-05-07T20:02:27.7810360Z U torch::Library::_def(c10::FunctionSchema&&, c10::OperatorName*, std::vector > const&, torch::_RegisterOrVerify) & 2025-05-07T20:02:27.7811185Z U torch::Library::_impl(char const*, torch::CppFunction&&, torch::_RegisterOrVerify) & 2025-05-07T20:02:27.7811920Z U torch::jit::parseSchema(std::__cxx11::basic_string, std::allocator > const&, bool) 2025-05-07T20:02:27.7812517Z U typeinfo for std::runtime_error@GLIBCXX_3.4 2025-05-07T20:02:27.7812978Z U vtable for __cxxabiv1::__class_type_info@CXXABI_1.3 2025-05-07T20:02:27.7813392Z U vtable for __cxxabiv1::__function_type_info@CXXABI_1.3 2025-05-07T20:02:27.7813826Z U vtable for __cxxabiv1::__si_class_type_info@CXXABI_1.3 2025-05-07T20:02:27.7814443Z U vtable for std::__cxx11::basic_ostringstream, std::allocator >@GLIBCXX_3.4.21 2025-05-07T20:02:27.7815198Z U vtable for std::__cxx11::basic_stringbuf, std::allocator >@GLIBCXX_3.4.21 2025-05-07T20:02:27.7815841Z U vtable for std::basic_ios >@GLIBCXX_3.4 2025-05-07T20:02:27.7816380Z U vtable for std::basic_streambuf >@GLIBCXX_3.4 2025-05-07T20:02:27.7816861Z w _ITM_deregisterTMCloneTable 2025-05-07T20:02:27.7817198Z w _ITM_registerTMCloneTable 2025-05-07T20:02:27.7817501Z w __cxa_finalize@GLIBC_2.2.5 2025-05-07T20:02:27.7817847Z w __gmon_start__ 2025-05-07T20:02:27.7818110Z w __pthread_key_create 2025-05-07T20:02:27.7818459Z [CHECK] Listing out external shared libraries linked: 2025-05-07T20:02:27.7819044Z + ldd ./_skbuild/linux-x86_64-3.13/cmake-build/experimental/example/fbgemm_gpu_experimental_example_py.so 2025-05-07T20:02:27.7819579Z 2025-05-07T20:02:27.7836062Z linux-vdso.so.1 (0x00007ffe25de6000) 2025-05-07T20:02:27.7837059Z libc10.so => not found 2025-05-07T20:02:27.7838082Z libnccl.so.2 => not found 2025-05-07T20:02:27.7838873Z libtorch_cpu.so => not found 2025-05-07T20:02:27.7839673Z libtorch_cuda.so => not found 2025-05-07T20:02:27.7840472Z libcudart.so.12 => not found 2025-05-07T20:02:27.7841242Z libtorch.so => not found 2025-05-07T20:02:27.7842209Z libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007fb5b4820000) 2025-05-07T20:02:27.7843494Z libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007fb5b47f2000) 2025-05-07T20:02:27.7844686Z libc.so.6 => /lib64/libc.so.6 (0x00007fb5b45ea000) 2025-05-07T20:02:27.7845055Z libm.so.6 => /lib64/libm.so.6 (0x00007fb5b450f000) 2025-05-07T20:02:27.7845404Z /lib64/ld-linux-x86-64.so.2 (0x00007fb5b4aff000) 2025-05-07T20:02:27.7845655Z 2025-05-07T20:02:27.7845762Z [CHECK] Displaying ELF information: 2025-05-07T20:02:27.7846333Z + readelf -d ./_skbuild/linux-x86_64-3.13/cmake-build/experimental/example/fbgemm_gpu_experimental_example_py.so 2025-05-07T20:02:27.7846824Z 2025-05-07T20:02:27.7877819Z 2025-05-07T20:02:27.7878484Z Dynamic section at offset 0x739a8 contains 36 entries: 2025-05-07T20:02:27.7879643Z Tag Type Name/Value 2025-05-07T20:02:27.7880928Z 0x0000000000000001 (NEEDED) Shared library: [libc10.so] 2025-05-07T20:02:27.7882447Z 0x0000000000000001 (NEEDED) Shared library: [libnccl.so.2] 2025-05-07T20:02:27.7884012Z 0x0000000000000001 (NEEDED) Shared library: [libtorch_cpu.so] 2025-05-07T20:02:27.7884984Z 0x0000000000000001 (NEEDED) Shared library: [libtorch_cuda.so] 2025-05-07T20:02:27.7885536Z 0x0000000000000001 (NEEDED) Shared library: [libcudart.so.12] 2025-05-07T20:02:27.7886061Z 0x0000000000000001 (NEEDED) Shared library: [libtorch.so] 2025-05-07T20:02:27.7886593Z 0x0000000000000001 (NEEDED) Shared library: [libstdc++.so.6] 2025-05-07T20:02:27.7887126Z 0x0000000000000001 (NEEDED) Shared library: [libgcc_s.so.1] 2025-05-07T20:02:27.7887638Z 0x0000000000000001 (NEEDED) Shared library: [libc.so.6] 2025-05-07T20:02:27.7888239Z 0x000000000000000e (SONAME) Library soname: [fbgemm_gpu_experimental_example_py.so] 2025-05-07T20:02:27.7888749Z 0x000000000000000c (INIT) 0x6000 2025-05-07T20:02:27.7889098Z 0x000000000000000d (FINI) 0xbadc 2025-05-07T20:02:27.7889437Z 0x0000000000000019 (INIT_ARRAY) 0x738b8 2025-05-07T20:02:27.7889884Z 0x000000000000001b (INIT_ARRAYSZ) 32 (bytes) 2025-05-07T20:02:27.7890349Z 0x000000000000001a (FINI_ARRAY) 0x738d8 2025-05-07T20:02:27.7890708Z 0x000000000000001c (FINI_ARRAYSZ) 8 (bytes) 2025-05-07T20:02:27.7891174Z 0x0000000000000004 (HASH) 0x200 2025-05-07T20:02:27.7891485Z 0x000000006ffffef5 (GNU_HASH) 0x900 2025-05-07T20:02:27.7891813Z 0x0000000000000005 (STRTAB) 0x1b70 2025-05-07T20:02:27.7892438Z 0x0000000000000006 (SYMTAB) 0xcd0 2025-05-07T20:02:27.7892783Z 0x000000000000000a (STRSZ) 10385 (bytes) 2025-05-07T20:02:27.7893314Z 0x000000000000000b (SYMENT) 24 (bytes) 2025-05-07T20:02:27.7893670Z 0x0000000000000003 (PLTGOT) 0x73c38 2025-05-07T20:02:27.7894035Z 0x0000000000000002 (PLTRELSZ) 1872 (bytes) 2025-05-07T20:02:27.7894389Z 0x0000000000000014 (PLTREL) RELA 2025-05-07T20:02:27.7894929Z 0x0000000000000017 (JMPREL) 0x4cb8 2025-05-07T20:02:27.7895265Z 0x0000000000000007 (RELA) 0x4610 2025-05-07T20:02:27.7895631Z 0x0000000000000008 (RELASZ) 1704 (bytes) 2025-05-07T20:02:27.7896025Z 0x0000000000000009 (RELAENT) 24 (bytes) 2025-05-07T20:02:27.7896375Z 0x0000000000000018 (BIND_NOW) 2025-05-07T20:02:27.7896710Z 0x000000006ffffffb (FLAGS_1) Flags: NOW 2025-05-07T20:02:27.7897084Z 0x000000006ffffffe (VERNEED) 0x4540 2025-05-07T20:02:27.7897440Z 0x000000006fffffff (VERNEEDNUM) 4 2025-05-07T20:02:27.7897777Z 0x000000006ffffff0 (VERSYM) 0x4402 2025-05-07T20:02:27.7898171Z 0x000000006ffffff9 (RELACOUNT) 7 2025-05-07T20:02:27.7898494Z 0x0000000000000000 (NULL) 0x0 2025-05-07T20:02:27.7898720Z 2025-05-07T20:02:27.7898841Z ################################################################################ 2025-05-07T20:02:27.7899075Z 2025-05-07T20:02:27.7899079Z 2025-05-07T20:02:27.7899200Z ################################################################################ 2025-05-07T20:02:27.7899773Z [CHECK] BUILT LIBRARY: ./_skbuild/linux-x86_64-3.13/cmake-build/asmjit.so 2025-05-07T20:02:27.7900466Z [CHECK] Listing out library size: 2025-05-07T20:02:27.7900877Z + du -h --block-size=1M ./_skbuild/linux-x86_64-3.13/cmake-build/asmjit.so 2025-05-07T20:02:27.7901214Z 2025-05-07T20:02:27.7901376Z 1 ./_skbuild/linux-x86_64-3.13/cmake-build/asmjit.so 2025-05-07T20:02:27.7901627Z 2025-05-07T20:02:27.7901962Z [CHECK] Listing out the GLIBC versions referenced by: ./_skbuild/linux-x86_64-3.13/cmake-build/asmjit.so 2025-05-07T20:02:27.7902861Z + objdump -TC ./_skbuild/linux-x86_64-3.13/cmake-build/asmjit.so | grep GLIBC_ | sed 's/.*GLIBC_\([.0-9]*\).*/GLIBC_\1/g' | sort -Vu | cat 2025-05-07T20:02:27.7903412Z 2025-05-07T20:02:27.8006166Z GLIBC_2.2.5 2025-05-07T20:02:27.8006820Z GLIBC_2.14 2025-05-07T20:02:27.8007209Z 2025-05-07T20:02:27.8007222Z 2025-05-07T20:02:27.8008299Z [CHECK] Listing out the GLIBCXX versions referenced by: ./_skbuild/linux-x86_64-3.13/cmake-build/asmjit.so 2025-05-07T20:02:27.8011073Z + objdump -TC ./_skbuild/linux-x86_64-3.13/cmake-build/asmjit.so | grep GLIBCXX_ | sed 's/.*GLIBCXX_\([.0-9]*\).*/GLIBCXX_\1/g' | sort -Vu | cat 2025-05-07T20:02:27.8013055Z 2025-05-07T20:02:27.8072185Z 2025-05-07T20:02:27.8072325Z 2025-05-07T20:02:27.8094742Z + nm -gDC ./_skbuild/linux-x86_64-3.13/cmake-build/asmjit.so > /tmp/tmp.yKc2Y7B9U5.symbols.txt 2025-05-07T20:02:27.8095220Z 2025-05-07T20:02:27.8125132Z 2025-05-07T20:02:27.8150995Z [CHECK] Total Number of symbols: 803 2025-05-07T20:02:27.8168929Z [CHECK] Number of fbgemm symbols: 0 2025-05-07T20:02:27.8186319Z + nm -gDCu ./_skbuild/linux-x86_64-3.13/cmake-build/asmjit.so > /tmp/tmp.e0jLhcXdu3.usymbols.txt 2025-05-07T20:02:27.8186787Z 2025-05-07T20:02:27.8203422Z 2025-05-07T20:02:27.8230108Z [CHECK] Listing out undefined symbols (49 total): 2025-05-07T20:02:27.8248031Z U _Unwind_Resume@GCC_3.0 2025-05-07T20:02:27.8249132Z U __cxa_guard_acquire@CXXABI_1.3 2025-05-07T20:02:27.8250443Z U __cxa_guard_release@CXXABI_1.3 2025-05-07T20:02:27.8251448Z U __errno_location@GLIBC_2.2.5 2025-05-07T20:02:27.8252415Z U __gxx_personality_v0@CXXABI_1.3 2025-05-07T20:02:27.8253389Z U __popcountdi2@GCC_3.4 2025-05-07T20:02:27.8254264Z U abort@GLIBC_2.2.5 2025-05-07T20:02:27.8254941Z U close@GLIBC_2.2.5 2025-05-07T20:02:27.8255251Z U fputs@GLIBC_2.2.5 2025-05-07T20:02:27.8255590Z U free@GLIBC_2.2.5 2025-05-07T20:02:27.8255913Z U ftruncate64@GLIBC_2.2.5 2025-05-07T20:02:27.8256279Z U fwrite@GLIBC_2.2.5 2025-05-07T20:02:27.8256630Z U getenv@GLIBC_2.2.5 2025-05-07T20:02:27.8256953Z U getpagesize@GLIBC_2.2.5 2025-05-07T20:02:27.8257316Z U madvise@GLIBC_2.2.5 2025-05-07T20:02:27.8257698Z U malloc@GLIBC_2.2.5 2025-05-07T20:02:27.8258020Z U memcmp@GLIBC_2.2.5 2025-05-07T20:02:27.8258315Z U memcpy@GLIBC_2.14 2025-05-07T20:02:27.8258671Z U memmove@GLIBC_2.2.5 2025-05-07T20:02:27.8258965Z U memset@GLIBC_2.2.5 2025-05-07T20:02:27.8259270Z U mmap@GLIBC_2.2.5 2025-05-07T20:02:27.8259677Z U mprotect@GLIBC_2.2.5 2025-05-07T20:02:27.8259994Z U munmap@GLIBC_2.2.5 2025-05-07T20:02:27.8260302Z U open64@GLIBC_2.2.5 2025-05-07T20:02:27.8260685Z U operator delete(void*, unsigned long)@CXXABI_1.3.9 2025-05-07T20:02:27.8261149Z U pthread_mutex_destroy@GLIBC_2.2.5 2025-05-07T20:02:27.8261494Z U pthread_mutex_lock@GLIBC_2.2.5 2025-05-07T20:02:27.8261854Z U pthread_mutex_unlock@GLIBC_2.2.5 2025-05-07T20:02:27.8262175Z U read@GLIBC_2.2.5 2025-05-07T20:02:27.8262505Z U realloc@GLIBC_2.2.5 2025-05-07T20:02:27.8262806Z U shm_open 2025-05-07T20:02:27.8263113Z U shm_unlink 2025-05-07T20:02:27.8263435Z U snprintf@GLIBC_2.2.5 2025-05-07T20:02:27.8263760Z U stderr@GLIBC_2.2.5 2025-05-07T20:02:27.8264083Z U strcmp@GLIBC_2.2.5 2025-05-07T20:02:27.8264396Z U strlen@GLIBC_2.2.5 2025-05-07T20:02:27.8264742Z U strtol@GLIBC_2.2.5 2025-05-07T20:02:27.8265056Z U syscall@GLIBC_2.2.5 2025-05-07T20:02:27.8265398Z U sysconf@GLIBC_2.2.5 2025-05-07T20:02:27.8265711Z U uname@GLIBC_2.2.5 2025-05-07T20:02:27.8266054Z U unlink@GLIBC_2.2.5 2025-05-07T20:02:27.8266394Z U vsnprintf@GLIBC_2.2.5 2025-05-07T20:02:27.8266791Z U vtable for __cxxabiv1::__class_type_info@CXXABI_1.3 2025-05-07T20:02:27.8267278Z U vtable for __cxxabiv1::__si_class_type_info@CXXABI_1.3 2025-05-07T20:02:27.8267742Z U vtable for __cxxabiv1::__vmi_class_type_info@CXXABI_1.3 2025-05-07T20:02:27.8268187Z w _ITM_deregisterTMCloneTable 2025-05-07T20:02:27.8268524Z w _ITM_registerTMCloneTable 2025-05-07T20:02:27.8268862Z w __cxa_finalize@GLIBC_2.2.5 2025-05-07T20:02:27.8269181Z w __gmon_start__ 2025-05-07T20:02:27.8269518Z [CHECK] Listing out external shared libraries linked: 2025-05-07T20:02:27.8269947Z + ldd ./_skbuild/linux-x86_64-3.13/cmake-build/asmjit.so 2025-05-07T20:02:27.8270213Z 2025-05-07T20:02:27.8302329Z linux-vdso.so.1 (0x00007ffeff7b1000) 2025-05-07T20:02:27.8303425Z libtorch_cpu.so => not found 2025-05-07T20:02:27.8303731Z libtorch_cuda.so => not found 2025-05-07T20:02:27.8304019Z libtorch.so => not found 2025-05-07T20:02:27.8304371Z libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007f124003d000) 2025-05-07T20:02:27.8304809Z libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f124000f000) 2025-05-07T20:02:27.8305225Z libc.so.6 => /lib64/libc.so.6 (0x00007f123fe05000) 2025-05-07T20:02:27.8305739Z libm.so.6 => /lib64/libm.so.6 (0x00007f123fd2a000) 2025-05-07T20:02:27.8306140Z /lib64/ld-linux-x86-64.so.2 (0x00007f1240320000) 2025-05-07T20:02:27.8306392Z 2025-05-07T20:02:27.8306512Z [CHECK] Displaying ELF information: 2025-05-07T20:02:27.8306917Z + readelf -d ./_skbuild/linux-x86_64-3.13/cmake-build/asmjit.so 2025-05-07T20:02:27.8307325Z 2025-05-07T20:02:27.8338885Z 2025-05-07T20:02:27.8340028Z Dynamic section at offset 0x78e78 contains 33 entries: 2025-05-07T20:02:27.8341217Z Tag Type Name/Value 2025-05-07T20:02:27.8342573Z 0x0000000000000001 (NEEDED) Shared library: [libtorch_cpu.so] 2025-05-07T20:02:27.8344191Z 0x0000000000000001 (NEEDED) Shared library: [libtorch_cuda.so] 2025-05-07T20:02:27.8345524Z 0x0000000000000001 (NEEDED) Shared library: [libtorch.so] 2025-05-07T20:02:27.8346029Z 0x0000000000000001 (NEEDED) Shared library: [libstdc++.so.6] 2025-05-07T20:02:27.8346652Z 0x0000000000000001 (NEEDED) Shared library: [libgcc_s.so.1] 2025-05-07T20:02:27.8347156Z 0x0000000000000001 (NEEDED) Shared library: [libc.so.6] 2025-05-07T20:02:27.8347676Z 0x000000000000000e (SONAME) Library soname: [asmjit.so] 2025-05-07T20:02:27.8348088Z 0x000000000000000c (INIT) 0x1a000 2025-05-07T20:02:27.8348423Z 0x000000000000000d (FINI) 0x5af2c 2025-05-07T20:02:27.8348740Z 0x0000000000000019 (INIT_ARRAY) 0x780a0 2025-05-07T20:02:27.8349081Z 0x000000000000001b (INIT_ARRAYSZ) 8 (bytes) 2025-05-07T20:02:27.8349410Z 0x000000000000001a (FINI_ARRAY) 0x780a8 2025-05-07T20:02:27.8349789Z 0x000000000000001c (FINI_ARRAYSZ) 8 (bytes) 2025-05-07T20:02:27.8350108Z 0x0000000000000004 (HASH) 0x200 2025-05-07T20:02:27.8350611Z 0x000000006ffffef5 (GNU_HASH) 0x1e18 2025-05-07T20:02:27.8350942Z 0x0000000000000005 (STRTAB) 0x86e0 2025-05-07T20:02:27.8351460Z 0x0000000000000006 (SYMTAB) 0x3b80 2025-05-07T20:02:27.8351887Z 0x000000000000000a (STRSZ) 45342 (bytes) 2025-05-07T20:02:27.8352257Z 0x000000000000000b (SYMENT) 24 (bytes) 2025-05-07T20:02:27.8352620Z 0x0000000000000003 (PLTGOT) 0x790d8 2025-05-07T20:02:27.8352979Z 0x0000000000000002 (PLTRELSZ) 8064 (bytes) 2025-05-07T20:02:27.8353346Z 0x0000000000000014 (PLTREL) RELA 2025-05-07T20:02:27.8353677Z 0x0000000000000017 (JMPREL) 0x17220 2025-05-07T20:02:27.8354026Z 0x0000000000000007 (RELA) 0x13ed8 2025-05-07T20:02:27.8354390Z 0x0000000000000008 (RELASZ) 13128 (bytes) 2025-05-07T20:02:27.8354772Z 0x0000000000000009 (RELAENT) 24 (bytes) 2025-05-07T20:02:27.8355118Z 0x0000000000000018 (BIND_NOW) 2025-05-07T20:02:27.8355453Z 0x000000006ffffffb (FLAGS_1) Flags: NOW 2025-05-07T20:02:27.8355827Z 0x000000006ffffffe (VERNEED) 0x13e48 2025-05-07T20:02:27.8356169Z 0x000000006fffffff (VERNEEDNUM) 3 2025-05-07T20:02:27.8356535Z 0x000000006ffffff0 (VERSYM) 0x137fe 2025-05-07T20:02:27.8356897Z 0x000000006ffffff9 (RELACOUNT) 3 2025-05-07T20:02:27.8357225Z 0x0000000000000000 (NULL) 0x0 2025-05-07T20:02:27.8357464Z 2025-05-07T20:02:27.8357588Z ################################################################################ 2025-05-07T20:02:27.8357829Z 2025-05-07T20:02:27.8357834Z 2025-05-07T20:02:27.8357977Z ################################################################################ 2025-05-07T20:02:27.8358432Z [CHECK] BUILT LIBRARY: ./_skbuild/linux-x86_64-3.13/cmake-build/fbgemm.so 2025-05-07T20:02:27.8358903Z [CHECK] Listing out library size: 2025-05-07T20:02:27.8359321Z + du -h --block-size=1M ./_skbuild/linux-x86_64-3.13/cmake-build/fbgemm.so 2025-05-07T20:02:27.8359666Z 2025-05-07T20:02:27.8359827Z 6 ./_skbuild/linux-x86_64-3.13/cmake-build/fbgemm.so 2025-05-07T20:02:27.8360083Z 2025-05-07T20:02:27.8360443Z [CHECK] Listing out the GLIBC versions referenced by: ./_skbuild/linux-x86_64-3.13/cmake-build/fbgemm.so 2025-05-07T20:02:27.8361376Z + objdump -TC ./_skbuild/linux-x86_64-3.13/cmake-build/fbgemm.so | grep GLIBC_ | sed 's/.*GLIBC_\([.0-9]*\).*/GLIBC_\1/g' | sort -Vu | cat 2025-05-07T20:02:27.8361950Z 2025-05-07T20:02:27.8694608Z GLIBC_2.2.5 2025-05-07T20:02:27.8695288Z GLIBC_2.3 2025-05-07T20:02:27.8695872Z GLIBC_2.14 2025-05-07T20:02:27.8696219Z 2025-05-07T20:02:27.8696232Z 2025-05-07T20:02:27.8697313Z [CHECK] Listing out the GLIBCXX versions referenced by: ./_skbuild/linux-x86_64-3.13/cmake-build/fbgemm.so 2025-05-07T20:02:27.8700687Z + objdump -TC ./_skbuild/linux-x86_64-3.13/cmake-build/fbgemm.so | grep GLIBCXX_ | sed 's/.*GLIBCXX_\([.0-9]*\).*/GLIBCXX_\1/g' | sort -Vu | cat 2025-05-07T20:02:27.8702462Z 2025-05-07T20:02:27.8947950Z GLIBCXX_3.4 2025-05-07T20:02:27.8948639Z GLIBCXX_3.4.9 2025-05-07T20:02:27.8949289Z GLIBCXX_3.4.11 2025-05-07T20:02:27.8949914Z GLIBCXX_3.4.14 2025-05-07T20:02:27.8950822Z GLIBCXX_3.4.15 2025-05-07T20:02:27.8951443Z GLIBCXX_3.4.18 2025-05-07T20:02:27.8952032Z GLIBCXX_3.4.21 2025-05-07T20:02:27.8952435Z 2025-05-07T20:02:27.8952563Z 2025-05-07T20:02:27.8971276Z + nm -gDC ./_skbuild/linux-x86_64-3.13/cmake-build/fbgemm.so > /tmp/tmp.5PgrlHAEo7.symbols.txt 2025-05-07T20:02:27.8972578Z 2025-05-07T20:02:27.9182936Z 2025-05-07T20:02:27.9210833Z [CHECK] Total Number of symbols: 4871 2025-05-07T20:02:27.9246404Z [CHECK] Number of fbgemm symbols: 3365 2025-05-07T20:02:27.9264999Z + nm -gDCu ./_skbuild/linux-x86_64-3.13/cmake-build/fbgemm.so > /tmp/tmp.YUEtzZdFUS.usymbols.txt 2025-05-07T20:02:27.9265464Z 2025-05-07T20:02:27.9297813Z 2025-05-07T20:02:27.9329415Z [CHECK] Listing out undefined symbols (135 total): 2025-05-07T20:02:27.9341747Z U _Unwind_Resume@GCC_3.0 2025-05-07T20:02:27.9342905Z U __cxa_allocate_exception@CXXABI_1.3 2025-05-07T20:02:27.9343894Z U __cxa_atexit@GLIBC_2.2.5 2025-05-07T20:02:27.9344493Z U __cxa_begin_catch@CXXABI_1.3 2025-05-07T20:02:27.9344841Z U __cxa_end_catch@CXXABI_1.3 2025-05-07T20:02:27.9345206Z U __cxa_free_exception@CXXABI_1.3 2025-05-07T20:02:27.9345556Z U __cxa_guard_abort@CXXABI_1.3 2025-05-07T20:02:27.9345926Z U __cxa_guard_acquire@CXXABI_1.3 2025-05-07T20:02:27.9346271Z U __cxa_guard_release@CXXABI_1.3 2025-05-07T20:02:27.9346673Z U __cxa_init_primary_exception@CXXABI_1.3.11 2025-05-07T20:02:27.9347047Z U __cxa_rethrow@CXXABI_1.3 2025-05-07T20:02:27.9347410Z U __cxa_thread_atexit@CXXABI_1.3.7 2025-05-07T20:02:27.9347756Z U __cxa_throw@CXXABI_1.3 2025-05-07T20:02:27.9348140Z U __cxa_throw_bad_array_new_length@CXXABI_1.3.8 2025-05-07T20:02:27.9348540Z U __gxx_personality_v0@CXXABI_1.3 2025-05-07T20:02:27.9348883Z U __once_proxy@GLIBCXX_3.4.11 2025-05-07T20:02:27.9349233Z U __tls_get_addr@GLIBC_2.3 2025-05-07T20:02:27.9349544Z U abort@GLIBC_2.2.5 2025-05-07T20:02:27.9349997Z U asmjit::_abi_1_13::BaseAssembler::bind(asmjit::_abi_1_13::Label const&) 2025-05-07T20:02:27.9350481Z U asmjit::_abi_1_13::BaseAssembler::newLabel() 2025-05-07T20:02:27.9351042Z U asmjit::_abi_1_13::BaseEmitter::_emitI(unsigned int, asmjit::_abi_1_13::Operand_ const&) 2025-05-07T20:02:27.9351849Z U asmjit::_abi_1_13::BaseEmitter::_emitI(unsigned int, asmjit::_abi_1_13::Operand_ const&, asmjit::_abi_1_13::Operand_ const&) 2025-05-07T20:02:27.9352854Z U asmjit::_abi_1_13::BaseEmitter::_emitI(unsigned int, asmjit::_abi_1_13::Operand_ const&, asmjit::_abi_1_13::Operand_ const&, asmjit::_abi_1_13::Operand_ const&) 2025-05-07T20:02:27.9354240Z U asmjit::_abi_1_13::BaseEmitter::_emitI(unsigned int, asmjit::_abi_1_13::Operand_ const&, asmjit::_abi_1_13::Operand_ const&, asmjit::_abi_1_13::Operand_ const&, asmjit::_abi_1_13::Operand_ const&) 2025-05-07T20:02:27.9355589Z U asmjit::_abi_1_13::BaseEmitter::emitArgsAssignment(asmjit::_abi_1_13::FuncFrame const&, asmjit::_abi_1_13::FuncArgsAssignment const&) 2025-05-07T20:02:27.9356503Z U asmjit::_abi_1_13::BaseEmitter::emitEpilog(asmjit::_abi_1_13::FuncFrame const&) 2025-05-07T20:02:27.9357129Z U asmjit::_abi_1_13::BaseEmitter::emitProlog(asmjit::_abi_1_13::FuncFrame const&) 2025-05-07T20:02:27.9357772Z U asmjit::_abi_1_13::CodeHolder::CodeHolder(asmjit::_abi_1_13::Support::Temporary const*) 2025-05-07T20:02:27.9358408Z U asmjit::_abi_1_13::CodeHolder::init(asmjit::_abi_1_13::Environment const&, unsigned long) 2025-05-07T20:02:27.9358948Z U asmjit::_abi_1_13::CodeHolder::~CodeHolder() 2025-05-07T20:02:27.9359482Z U asmjit::_abi_1_13::FuncArgsAssignment::updateFuncFrame(asmjit::_abi_1_13::FuncFrame&) const 2025-05-07T20:02:27.9360303Z U asmjit::_abi_1_13::FuncDetail::init(asmjit::_abi_1_13::FuncSignature const&, asmjit::_abi_1_13::Environment const&) 2025-05-07T20:02:27.9360953Z U asmjit::_abi_1_13::FuncFrame::finalize() 2025-05-07T20:02:27.9361391Z U asmjit::_abi_1_13::FuncFrame::init(asmjit::_abi_1_13::FuncDetail const&) 2025-05-07T20:02:27.9362032Z U asmjit::_abi_1_13::JitRuntime::JitRuntime(asmjit::_abi_1_13::JitAllocator::CreateParams const*) 2025-05-07T20:02:27.9362675Z U asmjit::_abi_1_13::JitRuntime::_add(void**, asmjit::_abi_1_13::CodeHolder*) 2025-05-07T20:02:27.9363128Z U asmjit::_abi_1_13::JitRuntime::~JitRuntime() 2025-05-07T20:02:27.9363638Z U asmjit::_abi_1_13::x86::Assembler::Assembler(asmjit::_abi_1_13::CodeHolder*) 2025-05-07T20:02:27.9364102Z U asmjit::_abi_1_13::x86::Assembler::~Assembler() 2025-05-07T20:02:27.9364476Z U cpuinfo_get_packages 2025-05-07T20:02:27.9364790Z U cpuinfo_get_packages_count 2025-05-07T20:02:27.9365128Z U cpuinfo_initialize 2025-05-07T20:02:27.9365442Z U cpuinfo_isa 2025-05-07T20:02:27.9365708Z U fma@GLIBC_2.2.5 2025-05-07T20:02:27.9366006Z U fmaf@GLIBC_2.2.5 2025-05-07T20:02:27.9366288Z U fminf@GLIBC_2.2.5 2025-05-07T20:02:27.9366591Z U free@GLIBC_2.2.5 2025-05-07T20:02:27.9366868Z U fwrite@GLIBC_2.2.5 2025-05-07T20:02:27.9367175Z U getenv@GLIBC_2.2.5 2025-05-07T20:02:27.9367462Z U log2@GLIBC_2.2.5 2025-05-07T20:02:27.9367762Z U log2f@GLIBC_2.2.5 2025-05-07T20:02:27.9368071Z U lrintf@GLIBC_2.2.5 2025-05-07T20:02:27.9368354Z U memcmp@GLIBC_2.2.5 2025-05-07T20:02:27.9368665Z U memcpy@GLIBC_2.14 2025-05-07T20:02:27.9368954Z U memmove@GLIBC_2.2.5 2025-05-07T20:02:27.9369266Z U memset@GLIBC_2.2.5 2025-05-07T20:02:27.9369718Z U nearbyint@GLIBC_2.2.5 2025-05-07T20:02:27.9370049Z U nearbyintf@GLIBC_2.2.5 2025-05-07T20:02:27.9370408Z U operator delete(void*, unsigned long)@CXXABI_1.3.9 2025-05-07T20:02:27.9370819Z U operator delete[](void*)@GLIBCXX_3.4 2025-05-07T20:02:27.9371199Z U operator new(unsigned long)@GLIBCXX_3.4 2025-05-07T20:02:27.9371564Z U operator new[](unsigned long)@GLIBCXX_3.4 2025-05-07T20:02:27.9371920Z U posix_memalign@GLIBC_2.2.5 2025-05-07T20:02:27.9372217Z U pow@GLIBC_2.2.5 2025-05-07T20:02:27.9372507Z U sqrtf@GLIBC_2.2.5 2025-05-07T20:02:27.9372892Z U std::_Hash_bytes(void const*, unsigned long, unsigned long)@CXXABI_1.3.5 2025-05-07T20:02:27.9373395Z U std::_Rb_tree_decrement(std::_Rb_tree_node_base*)@GLIBCXX_3.4 2025-05-07T20:02:27.9373855Z U std::_Rb_tree_increment(std::_Rb_tree_node_base*)@GLIBCXX_3.4 2025-05-07T20:02:27.9374530Z U std::_Rb_tree_insert_and_rebalance(bool, std::_Rb_tree_node_base*, std::_Rb_tree_node_base*, std::_Rb_tree_node_base&)@GLIBCXX_3.4 2025-05-07T20:02:27.9375280Z U std::__atomic_futex_unsigned_base::_M_futex_notify_all(unsigned int*)@GLIBCXX_3.4.21 2025-05-07T20:02:27.9376290Z U std::__atomic_futex_unsigned_base::_M_futex_wait_until(unsigned int*, unsigned int, bool, std::chrono::duration >, std::chrono::duration >)@GLIBCXX_3.4.21 2025-05-07T20:02:27.9377524Z U std::__detail::_Prime_rehash_policy::_M_need_rehash(unsigned long, unsigned long, unsigned long) const@GLIBCXX_3.4.18 2025-05-07T20:02:27.9378263Z U std::__detail::_Prime_rehash_policy::_M_next_bkt(unsigned long) const@GLIBCXX_3.4.18 2025-05-07T20:02:27.9378787Z U std::__exception_ptr::exception_ptr::_M_addref() 2025-05-07T20:02:27.9379215Z U std::__exception_ptr::exception_ptr::_M_release() 2025-05-07T20:02:27.9379982Z U std::__exception_ptr::exception_ptr::exception_ptr(void*)@CXXABI_1.3.11 2025-05-07T20:02:27.9380560Z U std::__future_base::_Result_base::_Result_base()@GLIBCXX_3.4.15 2025-05-07T20:02:27.9381081Z U std::__future_base::_Result_base::~_Result_base()@GLIBCXX_3.4.15 2025-05-07T20:02:27.9381542Z U std::__once_call@GLIBCXX_3.4.11 2025-05-07T20:02:27.9381898Z U std::__once_callable@GLIBCXX_3.4.11 2025-05-07T20:02:27.9382282Z U std::__throw_bad_alloc()@GLIBCXX_3.4 2025-05-07T20:02:27.9382676Z U std::__throw_bad_array_new_length() 2025-05-07T20:02:27.9383043Z U std::__throw_bad_cast()@GLIBCXX_3.4 2025-05-07T20:02:27.9383419Z U std::__throw_bad_function_call()@GLIBCXX_3.4.14 2025-05-07T20:02:27.9383837Z U std::__throw_future_error(int)@GLIBCXX_3.4.14 2025-05-07T20:02:27.9384263Z U std::__throw_length_error(char const*)@GLIBCXX_3.4 2025-05-07T20:02:27.9384679Z U std::__throw_logic_error(char const*)@GLIBCXX_3.4 2025-05-07T20:02:27.9385103Z U std::__throw_system_error(int)@GLIBCXX_3.4.11 2025-05-07T20:02:27.9385488Z U std::bad_alloc::~bad_alloc()@GLIBCXX_3.4 2025-05-07T20:02:27.9386352Z U std::basic_ostream >& std::__ostream_insert >(std::basic_ostream >&, char const*, long)@GLIBCXX_3.4.9 2025-05-07T20:02:27.9387188Z U std::cerr@GLIBCXX_3.4 2025-05-07T20:02:27.9387514Z U std::cout@GLIBCXX_3.4 2025-05-07T20:02:27.9387909Z U std::ctype::_M_widen_init() const@GLIBCXX_3.4.11 2025-05-07T20:02:27.9388322Z U std::future_category()@GLIBCXX_3.4.15 2025-05-07T20:02:27.9388723Z U std::future_error::~future_error()@GLIBCXX_3.4.14 2025-05-07T20:02:27.9389130Z U std::ios_base::Init::Init()@GLIBCXX_3.4 2025-05-07T20:02:27.9389491Z U std::ios_base::Init::~Init()@GLIBCXX_3.4 2025-05-07T20:02:27.9390170Z U std::logic_error::logic_error(std::__cxx11::basic_string, std::allocator > const&)@GLIBCXX_3.4.21 2025-05-07T20:02:27.9390917Z U std::logic_error::logic_error(std::logic_error const&)@GLIBCXX_3.4.21 2025-05-07T20:02:27.9391464Z U std::ostream& std::ostream::_M_insert(double)@GLIBCXX_3.4.9 2025-05-07T20:02:27.9391987Z U std::ostream& std::ostream::_M_insert(long)@GLIBCXX_3.4.9 2025-05-07T20:02:27.9392668Z U std::ostream& std::ostream::_M_insert(unsigned long)@GLIBCXX_3.4.9 2025-05-07T20:02:27.9393158Z U std::ostream::flush()@GLIBCXX_3.4 2025-05-07T20:02:27.9393562Z U std::ostream::operator<<(int)@GLIBCXX_3.4 2025-05-07T20:02:27.9393918Z U std::ostream::put(char)@GLIBCXX_3.4 2025-05-07T20:02:27.9394422Z U std::rethrow_exception(std::__exception_ptr::exception_ptr)@CXXABI_1.3.3 2025-05-07T20:02:27.9395053Z U std::runtime_error::runtime_error(char const*)@GLIBCXX_3.4.21 2025-05-07T20:02:27.9395490Z U std::runtime_error::~runtime_error()@GLIBCXX_3.4 2025-05-07T20:02:27.9395855Z U std::terminate()@GLIBCXX_3.4 2025-05-07T20:02:27.9396159Z U stderr@GLIBC_2.2.5 2025-05-07T20:02:27.9396463Z U strcmp@GLIBC_2.2.5 2025-05-07T20:02:27.9396738Z U strlen@GLIBC_2.2.5 2025-05-07T20:02:27.9397030Z U strstr@GLIBC_2.2.5 2025-05-07T20:02:27.9397306Z U tolower@GLIBC_2.2.5 2025-05-07T20:02:27.9397610Z U toupper@GLIBC_2.2.5 2025-05-07T20:02:27.9397973Z U typeinfo for std::__future_base::_Result_base@GLIBCXX_3.4.15 2025-05-07T20:02:27.9398423Z U typeinfo for std::bad_alloc@GLIBCXX_3.4 2025-05-07T20:02:27.9398802Z U typeinfo for std::future_error@GLIBCXX_3.4.14 2025-05-07T20:02:27.9399198Z U typeinfo for std::runtime_error@GLIBCXX_3.4 2025-05-07T20:02:27.9399594Z U vtable for __cxxabiv1::__class_type_info@CXXABI_1.3 2025-05-07T20:02:27.9400004Z U vtable for __cxxabiv1::__si_class_type_info@CXXABI_1.3 2025-05-07T20:02:27.9400772Z U vtable for std::bad_alloc@GLIBCXX_3.4 2025-05-07T20:02:27.9401212Z U vtable for std::future_error@GLIBCXX_3.4.14 2025-05-07T20:02:27.9401694Z w _ITM_deregisterTMCloneTable 2025-05-07T20:02:27.9402061Z w _ITM_registerTMCloneTable 2025-05-07T20:02:27.9402389Z w __cxa_finalize@GLIBC_2.2.5 2025-05-07T20:02:27.9402726Z w __gmon_start__ 2025-05-07T20:02:27.9403009Z w __pthread_key_create 2025-05-07T20:02:27.9403346Z w pthread_mutex_lock@GLIBC_2.2.5 2025-05-07T20:02:27.9403692Z w pthread_mutex_unlock@GLIBC_2.2.5 2025-05-07T20:02:27.9404033Z w pthread_once 2025-05-07T20:02:27.9404316Z w pthread_rwlock_rdlock 2025-05-07T20:02:27.9404640Z w pthread_rwlock_unlock 2025-05-07T20:02:27.9404963Z w pthread_rwlock_wrlock 2025-05-07T20:02:27.9405274Z w pthread_self@GLIBC_2.2.5 2025-05-07T20:02:27.9405662Z [CHECK] Listing out external shared libraries linked: 2025-05-07T20:02:27.9406077Z + ldd ./_skbuild/linux-x86_64-3.13/cmake-build/fbgemm.so 2025-05-07T20:02:27.9406361Z 2025-05-07T20:02:27.9406492Z linux-vdso.so.1 (0x00007fff314a8000) 2025-05-07T20:02:27.9406797Z libc10.so => not found 2025-05-07T20:02:27.9407352Z asmjit.so => /__w/FBGEMM/FBGEMM/fbgemm_gpu/./_skbuild/linux-x86_64-3.13/cmake-build/asmjit.so (0x00007f7670e8a000) 2025-05-07T20:02:27.9407956Z libtorch.so => not found 2025-05-07T20:02:27.9408232Z libtorch_cpu.so => not found 2025-05-07T20:02:27.9408535Z libtorch_cuda.so => not found 2025-05-07T20:02:27.9408883Z libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007f767059c000) 2025-05-07T20:02:27.9409307Z libm.so.6 => /lib64/libm.so.6 (0x00007f7670dad000) 2025-05-07T20:02:27.9409706Z libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f7670d7f000) 2025-05-07T20:02:27.9410112Z libc.so.6 => /lib64/libc.so.6 (0x00007f7670394000) 2025-05-07T20:02:27.9410499Z /lib64/ld-linux-x86-64.so.2 (0x00007f7670f09000) 2025-05-07T20:02:27.9410837Z libtorch_cpu.so => not found 2025-05-07T20:02:27.9411133Z libtorch_cuda.so => not found 2025-05-07T20:02:27.9411408Z libtorch.so => not found 2025-05-07T20:02:27.9411578Z 2025-05-07T20:02:27.9411718Z [CHECK] Displaying ELF information: 2025-05-07T20:02:27.9412107Z + readelf -d ./_skbuild/linux-x86_64-3.13/cmake-build/fbgemm.so 2025-05-07T20:02:27.9412415Z 2025-05-07T20:02:27.9452147Z 2025-05-07T20:02:27.9452945Z Dynamic section at offset 0x51fb38 contains 38 entries: 2025-05-07T20:02:27.9454181Z Tag Type Name/Value 2025-05-07T20:02:27.9454989Z 0x0000000000000001 (NEEDED) Shared library: [libc10.so] 2025-05-07T20:02:27.9455535Z 0x0000000000000001 (NEEDED) Shared library: [asmjit.so] 2025-05-07T20:02:27.9456073Z 0x0000000000000001 (NEEDED) Shared library: [libtorch.so] 2025-05-07T20:02:27.9456607Z 0x0000000000000001 (NEEDED) Shared library: [libtorch_cpu.so] 2025-05-07T20:02:27.9457171Z 0x0000000000000001 (NEEDED) Shared library: [libtorch_cuda.so] 2025-05-07T20:02:27.9457705Z 0x0000000000000001 (NEEDED) Shared library: [libstdc++.so.6] 2025-05-07T20:02:27.9458248Z 0x0000000000000001 (NEEDED) Shared library: [libm.so.6] 2025-05-07T20:02:27.9458758Z 0x0000000000000001 (NEEDED) Shared library: [libgcc_s.so.1] 2025-05-07T20:02:27.9459287Z 0x0000000000000001 (NEEDED) Shared library: [libc.so.6] 2025-05-07T20:02:27.9460086Z 0x0000000000000001 (NEEDED) Shared library: [ld-linux-x86-64.so.2] 2025-05-07T20:02:27.9460633Z 0x000000000000000e (SONAME) Library soname: [fbgemm.so] 2025-05-07T20:02:27.9461188Z 0x000000000000000f (RPATH) Library rpath: [$ORIGIN] 2025-05-07T20:02:27.9461624Z 0x000000000000000c (INIT) 0xf6000 2025-05-07T20:02:27.9461993Z 0x000000000000000d (FINI) 0x4c8fb0 2025-05-07T20:02:27.9462348Z 0x0000000000000019 (INIT_ARRAY) 0x51dac0 2025-05-07T20:02:27.9462727Z 0x000000000000001b (INIT_ARRAYSZ) 56 (bytes) 2025-05-07T20:02:27.9463089Z 0x000000000000001a (FINI_ARRAY) 0x51daf8 2025-05-07T20:02:27.9463506Z 0x000000000000001c (FINI_ARRAYSZ) 8 (bytes) 2025-05-07T20:02:27.9463878Z 0x0000000000000004 (HASH) 0x238 2025-05-07T20:02:27.9464217Z 0x000000006ffffef5 (GNU_HASH) 0x6e20 2025-05-07T20:02:27.9464588Z 0x0000000000000005 (STRTAB) 0x2b0a0 2025-05-07T20:02:27.9464931Z 0x0000000000000006 (SYMTAB) 0xe7e0 2025-05-07T20:02:27.9465318Z 0x000000000000000a (STRSZ) 708057 (bytes) 2025-05-07T20:02:27.9465695Z 0x000000000000000b (SYMENT) 24 (bytes) 2025-05-07T20:02:27.9466077Z 0x0000000000000003 (PLTGOT) 0x520dd8 2025-05-07T20:02:27.9466469Z 0x0000000000000002 (PLTRELSZ) 24312 (bytes) 2025-05-07T20:02:27.9466839Z 0x0000000000000014 (PLTREL) RELA 2025-05-07T20:02:27.9467195Z 0x0000000000000017 (JMPREL) 0xef8e0 2025-05-07T20:02:27.9467537Z 0x0000000000000007 (RELA) 0xda610 2025-05-07T20:02:27.9467911Z 0x0000000000000008 (RELASZ) 86736 (bytes) 2025-05-07T20:02:27.9468284Z 0x0000000000000009 (RELAENT) 24 (bytes) 2025-05-07T20:02:27.9468643Z 0x0000000000000018 (BIND_NOW) 2025-05-07T20:02:27.9468986Z 0x000000006ffffffb (FLAGS_1) Flags: NOW 2025-05-07T20:02:27.9469374Z 0x000000006ffffffe (VERNEED) 0xda490 2025-05-07T20:02:27.9469746Z 0x000000006fffffff (VERNEEDNUM) 5 2025-05-07T20:02:27.9470084Z 0x000000006ffffff0 (VERSYM) 0xd7e7a 2025-05-07T20:02:27.9470451Z 0x000000006ffffff9 (RELACOUNT) 9 2025-05-07T20:02:27.9470766Z 0x0000000000000000 (NULL) 0x0 2025-05-07T20:02:27.9470999Z 2025-05-07T20:02:27.9471119Z ################################################################################ 2025-05-07T20:02:27.9471352Z 2025-05-07T20:02:27.9471356Z 2025-05-07T20:02:27.9471577Z [CHECK] Verifying sample subset of symbols in the built libraries ... 2025-05-07T20:02:27.9644251Z [CHECK] Found symbol in ./_skbuild/linux-x86_64-3.13/cmake-build/experimental/gen_ai/fbgemm_gpu_experimental_gen_ai.so: fbgemm_gpu::per_tensor_quantize_i8 2025-05-07T20:02:27.9645217Z ################################################################################ 2025-05-07T20:02:27.9645776Z [BUILD] Wheel Audit: dist/fbgemm_gpu_genai_nightly-2025.5.7-cp313-cp313-manylinux_2_28_x86_64.whl 2025-05-07T20:02:27.9646239Z 2025-05-07T20:02:27.9653867Z + conda run --no-capture-output -n build_binary auditwheel show dist/fbgemm_gpu_genai_nightly-2025.5.7-cp313-cp313-manylinux_2_28_x86_64.whl 2025-05-07T20:02:27.9655342Z 2025-05-07T20:02:31.4761765Z 2025-05-07T20:02:31.4762398Z fbgemm_gpu_genai_nightly-2025.5.7-cp313-cp313-manylinux_2_28_x86_64.whl 2025-05-07T20:02:31.4763022Z is consistent with the following platform tag: "linux_x86_64". 2025-05-07T20:02:31.4763330Z 2025-05-07T20:02:31.4763503Z The wheel references external versioned symbols in these 2025-05-07T20:02:31.4763993Z system-provided shared libraries: libgcc_s.so.1 with versions 2025-05-07T20:02:31.4764446Z {'GCC_3.4', 'GCC_3.0'}, libstdc++.so.6 with versions 2025-05-07T20:02:31.4764853Z {'GLIBCXX_3.4.15', 'GLIBCXX_3.4.21', 'GLIBCXX_3.4.9', 2025-05-07T20:02:31.4765300Z 'GLIBCXX_3.4.11', 'GLIBCXX_3.4.20', 'CXXABI_1.3.11', 'GLIBCXX_3.4.29', 2025-05-07T20:02:31.4765771Z 'CXXABI_1.3.3', 'CXXABI_1.3.7', 'CXXABI_1.3.5', 'GLIBCXX_3.4.14', 2025-05-07T20:02:31.4766438Z 'CXXABI_1.3', 'CXXABI_1.3.9', 'GLIBCXX_3.4.18', 'CXXABI_1.3.8', 2025-05-07T20:02:31.4766926Z 'GLIBCXX_3.4'}, libc.so.6 with versions {'GLIBC_2.14', 'GLIBC_2.2.5'}, 2025-05-07T20:02:31.4767531Z libm.so.6 with versions {'GLIBC_2.2.5'}, libcudart.so.12 with versions 2025-05-07T20:02:31.4767956Z {'libcudart.so.12'} 2025-05-07T20:02:31.4768104Z 2025-05-07T20:02:31.4768319Z This constrains the platform tag to "manylinux_2_34_x86_64". In order 2025-05-07T20:02:31.4768860Z to achieve a more compatible tag, you would need to recompile a new 2025-05-07T20:02:31.4769366Z wheel from source on a system with earlier versions of these 2025-05-07T20:02:31.4769800Z libraries, such as a recent manylinux image. 2025-05-07T20:02:31.5651148Z 2025-05-07T20:02:31.5651275Z 2025-05-07T20:02:31.5651904Z ################################################################################ 2025-05-07T20:02:31.5652573Z [BUILD] Enumerating the built wheels ... 2025-05-07T20:02:31.5653104Z + ls -lth dist/fbgemm_gpu_genai_nightly-2025.5.7-cp313-cp313-manylinux_2_28_x86_64.whl 2025-05-07T20:02:31.5653487Z 2025-05-07T20:02:31.5717127Z -rw-r--r--. 1 root root 18M May 7 20:02 dist/fbgemm_gpu_genai_nightly-2025.5.7-cp313-cp313-manylinux_2_28_x86_64.whl 2025-05-07T20:02:31.5718621Z 2025-05-07T20:02:31.5718963Z [BUILD] Enumerating the wheel SHAs ... 2025-05-07T20:02:31.5726736Z + sha1sum dist/fbgemm_gpu_genai_nightly-2025.5.7-cp313-cp313-manylinux_2_28_x86_64.whl 2025-05-07T20:02:31.5727968Z 2025-05-07T20:02:31.6100897Z 891428e398d8fa44bdcd60728272fd376b27a8ba dist/fbgemm_gpu_genai_nightly-2025.5.7-cp313-cp313-manylinux_2_28_x86_64.whl 2025-05-07T20:02:31.6102639Z 2025-05-07T20:02:31.6105174Z + sha256sum dist/fbgemm_gpu_genai_nightly-2025.5.7-cp313-cp313-manylinux_2_28_x86_64.whl 2025-05-07T20:02:31.6105599Z 2025-05-07T20:02:31.6902592Z 86a533cac2dc47ba6525697cbaf3fe89eda98f1fc3bd69dfc08261cb1f2d2035 dist/fbgemm_gpu_genai_nightly-2025.5.7-cp313-cp313-manylinux_2_28_x86_64.whl 2025-05-07T20:02:31.6903313Z 2025-05-07T20:02:31.6910776Z + md5sum dist/fbgemm_gpu_genai_nightly-2025.5.7-cp313-cp313-manylinux_2_28_x86_64.whl 2025-05-07T20:02:31.6911254Z 2025-05-07T20:02:31.7222044Z 4c3714dae593cf99d3df6aac70dd67cf dist/fbgemm_gpu_genai_nightly-2025.5.7-cp313-cp313-manylinux_2_28_x86_64.whl 2025-05-07T20:02:31.7222698Z 2025-05-07T20:02:31.7222904Z [BUILD] FBGEMM-GPU build + package completed 2025-05-07T20:02:31.8794084Z ##[group]Run actions/upload-artifact@v4 2025-05-07T20:02:31.8794438Z with: 2025-05-07T20:02:31.8794737Z name: fbgemm_genai_x86_gcc_py3.13_cu12.8.0.whl 2025-05-07T20:02:31.8795099Z path: fbgemm_gpu/dist/*.whl 2025-05-07T20:02:31.8795427Z if-no-files-found: error 2025-05-07T20:02:31.8795718Z compression-level: 6 2025-05-07T20:02:31.8796048Z overwrite: false 2025-05-07T20:02:31.8796319Z include-hidden-files: false 2025-05-07T20:02:31.8796646Z env: 2025-05-07T20:02:31.8796937Z PRELUDE: .github/scripts/setup_env.bash 2025-05-07T20:02:31.8797270Z BUILD_ENV: build_binary 2025-05-07T20:02:31.8797577Z BUILD_TARGET: genai 2025-05-07T20:02:31.8797833Z BUILD_VARIANT: cuda 2025-05-07T20:02:31.8798134Z BUILD_CUDA_VERSION: 12.8.0 2025-05-07T20:02:31.8798415Z ##[endgroup] 2025-05-07T20:02:31.8815649Z ##[command]/usr/bin/docker exec 565b81b7c816cbdd14afbfa510e3c8636c8644acf5a2e5045d5b002a6b1a6184 sh -c "cat /etc/*release | grep ^ID" 2025-05-07T20:02:32.9606017Z With the provided path, there will be 1 file uploaded 2025-05-07T20:02:32.9608926Z Artifact name is valid! 2025-05-07T20:02:32.9609598Z Root directory input is valid! 2025-05-07T20:02:33.0706051Z Beginning upload of artifact content to blob storage 2025-05-07T20:02:33.9494992Z Uploaded bytes 8388608 2025-05-07T20:02:34.1823361Z Uploaded bytes 16777216 2025-05-07T20:02:34.3812854Z Uploaded bytes 18508688 2025-05-07T20:02:34.3957560Z Finished uploading artifact content to blob storage! 2025-05-07T20:02:34.3958756Z SHA256 digest of uploaded artifact zip is 0316113a2b3fde93fffa97b955c92dd5eef475455a84550f9225df12df45620e 2025-05-07T20:02:34.3959453Z Finalizing artifact upload 2025-05-07T20:02:34.4694467Z Artifact fbgemm_genai_x86_gcc_py3.13_cu12.8.0.whl.zip successfully finalized. Artifact ID 3081398569 2025-05-07T20:02:34.4696315Z Artifact fbgemm_genai_x86_gcc_py3.13_cu12.8.0.whl has been successfully uploaded! Final size is 18508688 bytes. Artifact ID is 3081398569 2025-05-07T20:02:34.4697834Z Artifact download URL: https://github.com/pytorch/FBGEMM/actions/runs/14891846252/artifacts/3081398569 2025-05-07T20:02:34.5023321Z Post job cleanup. 2025-05-07T20:02:34.5037249Z ##[command]/usr/bin/docker exec 565b81b7c816cbdd14afbfa510e3c8636c8644acf5a2e5045d5b002a6b1a6184 sh -c "cat /etc/*release | grep ^ID" 2025-05-07T20:02:34.8225758Z [command]/usr/bin/git version 2025-05-07T20:02:34.8492928Z git version 2.47.1 2025-05-07T20:02:34.8528007Z Copying '/github/home/.gitconfig' to '/__w/_temp/8f23ee01-981d-4d02-b852-8485665d3423/.gitconfig' 2025-05-07T20:02:34.8538849Z Temporarily overriding HOME='/__w/_temp/8f23ee01-981d-4d02-b852-8485665d3423' before making global git config changes 2025-05-07T20:02:34.8539841Z Adding repository directory to the temporary git global config as a safe directory 2025-05-07T20:02:34.8540568Z [command]/usr/bin/git config --global --add safe.directory /__w/FBGEMM/FBGEMM 2025-05-07T20:02:34.8605025Z [command]/usr/bin/git config --local --name-only --get-regexp core\.sshCommand 2025-05-07T20:02:34.8633172Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'core\.sshCommand' && git config --local --unset-all 'core.sshCommand' || :" 2025-05-07T20:02:34.9211415Z Entering 'external/asmjit' 2025-05-07T20:02:34.9335263Z Entering 'external/composable_kernel' 2025-05-07T20:02:34.9482236Z Entering 'external/cpuinfo' 2025-05-07T20:02:34.9591067Z Entering 'external/cutlass' 2025-05-07T20:02:34.9765106Z Entering 'external/googletest' 2025-05-07T20:02:34.9882399Z Entering 'external/hipify_torch' 2025-05-07T20:02:35.0006934Z Entering 'external/json' 2025-05-07T20:02:35.0112747Z [command]/usr/bin/git config --local --name-only --get-regexp http\.https\:\/\/github\.com\/\.extraheader 2025-05-07T20:02:35.0133593Z http.https://github.com/.extraheader 2025-05-07T20:02:35.0138548Z [command]/usr/bin/git config --local --unset-all http.https://github.com/.extraheader 2025-05-07T20:02:35.0167229Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'http\.https\:\/\/github\.com\/\.extraheader' && git config --local --unset-all 'http.https://github.com/.extraheader' || :" 2025-05-07T20:02:35.0452639Z Entering 'external/asmjit' 2025-05-07T20:02:35.0485535Z http.https://github.com/.extraheader 2025-05-07T20:02:35.0524491Z Entering 'external/composable_kernel' 2025-05-07T20:02:35.0558698Z http.https://github.com/.extraheader 2025-05-07T20:02:35.0623200Z Entering 'external/cpuinfo' 2025-05-07T20:02:35.0659562Z http.https://github.com/.extraheader 2025-05-07T20:02:35.0694633Z Entering 'external/cutlass' 2025-05-07T20:02:35.0742073Z http.https://github.com/.extraheader 2025-05-07T20:02:35.0786931Z Entering 'external/googletest' 2025-05-07T20:02:35.0821674Z http.https://github.com/.extraheader 2025-05-07T20:02:35.0863649Z Entering 'external/hipify_torch' 2025-05-07T20:02:35.0912583Z http.https://github.com/.extraheader 2025-05-07T20:02:35.0962841Z Entering 'external/json' 2025-05-07T20:02:35.0996741Z http.https://github.com/.extraheader 2025-05-07T20:02:35.1201556Z Stop and remove container: bc8a7aa379e24ad1bb0513de8877a55e_amazonlinux2023_b22b95 2025-05-07T20:02:35.1213711Z ##[command]/usr/bin/docker rm --force 565b81b7c816cbdd14afbfa510e3c8636c8644acf5a2e5045d5b002a6b1a6184 2025-05-07T20:02:36.2602977Z 565b81b7c816cbdd14afbfa510e3c8636c8644acf5a2e5045d5b002a6b1a6184 2025-05-07T20:02:36.2637378Z Remove container network: github_network_2fbdf5bf774c440b8886f7414b350d71 2025-05-07T20:02:36.2641674Z ##[command]/usr/bin/docker network rm github_network_2fbdf5bf774c440b8886f7414b350d71 2025-05-07T20:02:37.1901299Z github_network_2fbdf5bf774c440b8886f7414b350d71 2025-05-07T20:02:37.1937941Z A job completed hook has been configured by the self-hosted runner administrator 2025-05-07T20:02:37.2129062Z ##[group]Run '/home/ec2-user/runner-scripts/after_job.sh' 2025-05-07T20:02:37.2134809Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-05-07T20:02:37.2135221Z ##[endgroup] 2025-05-07T20:02:49.1896645Z Cleaning up orphan processes