2025-05-07T19:42:32.7585052Z Current runner version: '2.323.0' 2025-05-07T19:42:32.7591025Z Runner name: 'i-08cf476e34faaf25f' 2025-05-07T19:42:32.7592062Z Machine name: 'ip-10-0-56-8' 2025-05-07T19:42:32.7594812Z ##[group]GITHUB_TOKEN Permissions 2025-05-07T19:42:32.7597257Z Contents: read 2025-05-07T19:42:32.7597895Z Metadata: read 2025-05-07T19:42:32.7598469Z Packages: read 2025-05-07T19:42:32.7598958Z ##[endgroup] 2025-05-07T19:42:32.7601286Z Secret source: None 2025-05-07T19:42:32.7601968Z Prepare workflow directory 2025-05-07T19:42:32.8214323Z Prepare all required actions 2025-05-07T19:42:32.8252001Z Getting action download info 2025-05-07T19:42:32.9927638Z Download action repository 'actions/checkout@v4' (SHA:11bd71901bbe5b1630ceea73d27597364c9af683) 2025-05-07T19:42:33.2818479Z Download action repository 'actions/upload-artifact@v4' (SHA:ea165f8d65b6e75b540449e92b4886f43607fa02) 2025-05-07T19:42:33.8173866Z Complete job name: build_artifact (x86, linux.24xlarge, genai, 3.12, 12.8.0, gcc) 2025-05-07T19:42:33.9098853Z A job started hook has been configured by the self-hosted runner administrator 2025-05-07T19:42:33.9238633Z ##[group]Run '/home/ec2-user/runner-scripts/before_job.sh' 2025-05-07T19:42:33.9248896Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-05-07T19:42:33.9250393Z ##[endgroup] 2025-05-07T19:42:35.0317252Z Runner Type: linux.24xlarge 2025-05-07T19:42:35.0317788Z Instance Type: c5.24xlarge 2025-05-07T19:42:35.0318106Z AMI Name: unknown 2025-05-07T19:42:35.0359911Z AMI ID: ami-071226ecf16aa7d96 2025-05-07T19:42:40.1134035Z ##[group]Checking docker version 2025-05-07T19:42:40.1147318Z ##[command]/usr/bin/docker version --format '{{.Server.APIVersion}}' 2025-05-07T19:42:40.1365734Z '1.44' 2025-05-07T19:42:40.1382035Z Docker daemon API version: '1.44' 2025-05-07T19:42:40.1382597Z ##[command]/usr/bin/docker version --format '{{.Client.APIVersion}}' 2025-05-07T19:42:40.1561894Z '1.44' 2025-05-07T19:42:40.1571951Z Docker client API version: '1.44' 2025-05-07T19:42:40.1580068Z ##[endgroup] 2025-05-07T19:42:40.1583154Z ##[group]Clean up resources from previous jobs 2025-05-07T19:42:40.1589384Z ##[command]/usr/bin/docker ps --all --quiet --no-trunc --filter "label=808603" 2025-05-07T19:42:40.1760000Z ##[command]/usr/bin/docker network prune --force --filter "label=808603" 2025-05-07T19:42:40.1907825Z ##[endgroup] 2025-05-07T19:42:40.1908200Z ##[group]Create local container network 2025-05-07T19:42:40.1918618Z ##[command]/usr/bin/docker network create --label 808603 github_network_3a00299775bc40d0bdf0cd164938c757 2025-05-07T19:42:40.4242743Z f18216f2beb75e49c2241a5f4e66cb01c72cdc2d0f9316e484d8d987ed63d5b5 2025-05-07T19:42:40.4261836Z ##[endgroup] 2025-05-07T19:42:40.4286515Z ##[group]Starting job container 2025-05-07T19:42:40.4305927Z ##[command]/usr/bin/docker pull amazonlinux:2023 2025-05-07T19:42:40.6647387Z 2023: Pulling from library/amazonlinux 2025-05-07T19:42:40.7226331Z 1c3112c87ab2: Pulling fs layer 2025-05-07T19:42:41.2850714Z 1c3112c87ab2: Verifying Checksum 2025-05-07T19:42:41.2852805Z 1c3112c87ab2: Download complete 2025-05-07T19:42:42.7088570Z 1c3112c87ab2: Pull complete 2025-05-07T19:42:42.7239791Z Digest: sha256:cb5b4c509d62ae388f674c139ae5e8281fc160c217d474445e912043e1941988 2025-05-07T19:42:42.7289352Z Status: Downloaded newer image for amazonlinux:2023 2025-05-07T19:42:42.7314878Z docker.io/library/amazonlinux:2023 2025-05-07T19:42:42.7410758Z ##[command]/usr/bin/docker create --name 78fd0503c2694dd883e9ecfdef44701d_amazonlinux2023_f768cc --label 808603 --workdir /__w/FBGEMM/FBGEMM --network github_network_3a00299775bc40d0bdf0cd164938c757 --user root -e "HOME=/github/home" -e GITHUB_ACTIONS=true -e CI=true -v "/var/run/docker.sock":"/var/run/docker.sock" -v "/home/ec2-user/actions-runner/_work":"/__w" -v "/home/ec2-user/actions-runner/externals":"/__e":ro -v "/home/ec2-user/actions-runner/_work/_temp":"/__w/_temp" -v "/home/ec2-user/actions-runner/_work/_actions":"/__w/_actions" -v "/home/ec2-user/actions-runner/_work/_tool":"/__w/_tool" -v "/home/ec2-user/actions-runner/_work/_temp/_github_home":"/github/home" -v "/home/ec2-user/actions-runner/_work/_temp/_github_workflow":"/github/workflow" --entrypoint "tail" amazonlinux:2023 "-f" "/dev/null" 2025-05-07T19:42:43.0793551Z 3634184bf123dceeb3f825b9e1bbe7817ab6e81e701296975297ae3260d158ce 2025-05-07T19:42:43.0818315Z ##[command]/usr/bin/docker start 3634184bf123dceeb3f825b9e1bbe7817ab6e81e701296975297ae3260d158ce 2025-05-07T19:42:43.5815202Z 3634184bf123dceeb3f825b9e1bbe7817ab6e81e701296975297ae3260d158ce 2025-05-07T19:42:43.5838556Z ##[command]/usr/bin/docker ps --all --filter id=3634184bf123dceeb3f825b9e1bbe7817ab6e81e701296975297ae3260d158ce --filter status=running --no-trunc --format "{{.ID}} {{.Status}}" 2025-05-07T19:42:43.5986226Z 3634184bf123dceeb3f825b9e1bbe7817ab6e81e701296975297ae3260d158ce Up Less than a second 2025-05-07T19:42:43.6004783Z ##[command]/usr/bin/docker inspect --format "{{range .Config.Env}}{{println .}}{{end}}" 3634184bf123dceeb3f825b9e1bbe7817ab6e81e701296975297ae3260d158ce 2025-05-07T19:42:43.6162972Z CI=true 2025-05-07T19:42:43.6163505Z HOME=/github/home 2025-05-07T19:42:43.6163932Z GITHUB_ACTIONS=true 2025-05-07T19:42:43.6164477Z PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin 2025-05-07T19:42:43.6185753Z ##[endgroup] 2025-05-07T19:42:43.6196925Z ##[group]Waiting for all services to be ready 2025-05-07T19:42:43.6198714Z ##[endgroup] 2025-05-07T19:42:43.6282848Z ##[group]Run yum update -y; yum install -y binutils findutils git pciutils sudo tar wget which 2025-05-07T19:42:43.6283648Z yum update -y; yum install -y binutils findutils git pciutils sudo tar wget which 2025-05-07T19:42:43.6284636Z shell: bash --noprofile --norc -e -o pipefail {0} 2025-05-07T19:42:43.6285012Z env: 2025-05-07T19:42:43.6285324Z PRELUDE: .github/scripts/setup_env.bash 2025-05-07T19:42:43.6285765Z BUILD_ENV: build_binary 2025-05-07T19:42:43.6286057Z BUILD_TARGET: genai 2025-05-07T19:42:43.6286385Z BUILD_VARIANT: cuda 2025-05-07T19:42:43.6286676Z BUILD_CUDA_VERSION: 12.8.0 2025-05-07T19:42:43.6287014Z ##[endgroup] 2025-05-07T19:42:44.4797336Z Amazon Linux 2023 repository 67 MB/s | 37 MB 00:00 2025-05-07T19:42:51.0951775Z Last metadata expiration check: 0:00:07 ago on Wed May 7 19:42:44 2025. 2025-05-07T19:42:51.6555842Z Dependencies resolved. 2025-05-07T19:42:51.6735870Z Nothing to do. 2025-05-07T19:42:51.6736531Z Complete! 2025-05-07T19:42:51.9152536Z Last metadata expiration check: 0:00:07 ago on Wed May 7 19:42:44 2025. 2025-05-07T19:42:51.9786559Z Dependencies resolved. 2025-05-07T19:42:52.0012351Z ======================================================================================== 2025-05-07T19:42:52.0012999Z Package Arch Version Repository Size 2025-05-07T19:42:52.0013601Z ======================================================================================== 2025-05-07T19:42:52.0014084Z Installing: 2025-05-07T19:42:52.0014630Z binutils x86_64 2.41-50.amzn2023.0.3 amazonlinux 5.3 M 2025-05-07T19:42:52.0015267Z findutils x86_64 1:4.8.0-2.amzn2023.0.2 amazonlinux 539 k 2025-05-07T19:42:52.0015871Z git x86_64 2.47.1-1.amzn2023.0.2 amazonlinux 54 k 2025-05-07T19:42:52.0016423Z pciutils x86_64 3.7.0-3.amzn2023.0.2 amazonlinux 93 k 2025-05-07T19:42:52.0017098Z sudo x86_64 1.9.15-1.p5.amzn2023.0.1 amazonlinux 1.3 M 2025-05-07T19:42:52.0017679Z tar x86_64 2:1.34-1.amzn2023.0.4 amazonlinux 879 k 2025-05-07T19:42:52.0018198Z wget x86_64 1.21.3-1.amzn2023.0.4 amazonlinux 779 k 2025-05-07T19:42:52.0018838Z which x86_64 2.21-26.amzn2023.0.2 amazonlinux 42 k 2025-05-07T19:42:52.0019299Z Installing dependencies: 2025-05-07T19:42:52.0019768Z cracklib x86_64 2.9.6-27.amzn2023.0.2 amazonlinux 82 k 2025-05-07T19:42:52.0020449Z cyrus-sasl-lib x86_64 2.1.27-18.amzn2023.0.3 amazonlinux 786 k 2025-05-07T19:42:52.0021365Z elfutils-debuginfod-client x86_64 0.188-3.amzn2023.0.2 amazonlinux 41 k 2025-05-07T19:42:52.0022037Z git-core x86_64 2.47.1-1.amzn2023.0.2 amazonlinux 4.7 M 2025-05-07T19:42:52.0022683Z git-core-doc noarch 2.47.1-1.amzn2023.0.2 amazonlinux 2.8 M 2025-05-07T19:42:52.0023437Z gnutls x86_64 3.8.3-6.amzn2023.0.1 amazonlinux 1.1 M 2025-05-07T19:42:52.0024054Z groff-base x86_64 1.22.4-7.amzn2023.0.2 amazonlinux 1.0 M 2025-05-07T19:42:52.0024662Z gzip x86_64 1.12-1.amzn2023.0.1 amazonlinux 160 k 2025-05-07T19:42:52.0025270Z hwdata noarch 0.384-1.amzn2023.0.3 amazonlinux 1.6 M 2025-05-07T19:42:52.0025859Z jansson x86_64 2.14-0.amzn2023 amazonlinux 46 k 2025-05-07T19:42:52.0163841Z kmod-libs x86_64 29-2.amzn2023.0.5 amazonlinux 62 k 2025-05-07T19:42:52.0164461Z less x86_64 608-2.amzn2023.0.2 amazonlinux 168 k 2025-05-07T19:42:52.0165226Z libcbor x86_64 0.7.0-3.amzn2023.0.2 amazonlinux 57 k 2025-05-07T19:42:52.0165723Z libdb x86_64 5.3.28-49.amzn2023.0.2 amazonlinux 756 k 2025-05-07T19:42:52.0166241Z libeconf x86_64 0.4.0-1.amzn2023.0.3 amazonlinux 28 k 2025-05-07T19:42:52.0166761Z libedit x86_64 3.1-38.20210714cvs.amzn2023.0.2 amazonlinux 108 k 2025-05-07T19:42:52.0167260Z libfdisk x86_64 2.37.4-1.amzn2023.0.4 amazonlinux 153 k 2025-05-07T19:42:52.0167865Z libfido2 x86_64 1.10.0-2.amzn2023.0.2 amazonlinux 95 k 2025-05-07T19:42:52.0168491Z libmetalink x86_64 0.1.3-14.amzn2023.0.2 amazonlinux 31 k 2025-05-07T19:42:52.0169322Z libpwquality x86_64 1.4.4-6.amzn2023.0.2 amazonlinux 106 k 2025-05-07T19:42:52.0169969Z libsemanage x86_64 3.4-5.amzn2023.0.2 amazonlinux 121 k 2025-05-07T19:42:52.0170514Z libutempter x86_64 1.2.1-4.amzn2023.0.2 amazonlinux 26 k 2025-05-07T19:42:52.0171007Z nano x86_64 8.3-1.amzn2023 amazonlinux 706 k 2025-05-07T19:42:52.0171502Z ncurses x86_64 6.2-4.20200222.amzn2023.0.6 amazonlinux 394 k 2025-05-07T19:42:52.0172012Z nettle x86_64 3.10.1-1.amzn2023.0.1 amazonlinux 573 k 2025-05-07T19:42:52.0172521Z openldap x86_64 2.4.57-6.amzn2023.0.7 amazonlinux 256 k 2025-05-07T19:42:52.0173033Z openssh x86_64 8.7p1-8.amzn2023.0.14 amazonlinux 454 k 2025-05-07T19:42:52.0173557Z openssh-clients x86_64 8.7p1-8.amzn2023.0.14 amazonlinux 708 k 2025-05-07T19:42:52.0174088Z pam x86_64 1.5.1-8.amzn2023.0.4 amazonlinux 542 k 2025-05-07T19:42:52.0174593Z pciutils-libs x86_64 3.7.0-3.amzn2023.0.2 amazonlinux 41 k 2025-05-07T19:42:52.0175161Z perl-AutoLoader noarch 5.74-477.amzn2023.0.6 amazonlinux 22 k 2025-05-07T19:42:52.0175796Z perl-B x86_64 1.80-477.amzn2023.0.6 amazonlinux 179 k 2025-05-07T19:42:52.0176308Z perl-Carp noarch 1.50-458.amzn2023.0.2 amazonlinux 29 k 2025-05-07T19:42:52.0176876Z perl-Class-Struct noarch 0.66-477.amzn2023.0.6 amazonlinux 22 k 2025-05-07T19:42:52.0177466Z perl-Data-Dumper x86_64 2.174-460.amzn2023.0.2 amazonlinux 55 k 2025-05-07T19:42:52.0178028Z perl-Digest noarch 1.20-1.amzn2023.0.2 amazonlinux 26 k 2025-05-07T19:42:52.0178711Z perl-Digest-MD5 x86_64 2.58-2.amzn2023.0.2 amazonlinux 36 k 2025-05-07T19:42:52.0179271Z perl-DynaLoader x86_64 1.47-477.amzn2023.0.6 amazonlinux 26 k 2025-05-07T19:42:52.0179857Z perl-Encode x86_64 4:3.15-462.amzn2023.0.2 amazonlinux 1.7 M 2025-05-07T19:42:52.0180400Z perl-Errno x86_64 1.30-477.amzn2023.0.6 amazonlinux 15 k 2025-05-07T19:42:52.0180991Z perl-Error noarch 1:0.17029-5.amzn2023.0.2 amazonlinux 41 k 2025-05-07T19:42:52.0181538Z perl-Exporter noarch 5.74-459.amzn2023.0.2 amazonlinux 31 k 2025-05-07T19:42:52.0182085Z perl-Fcntl x86_64 1.13-477.amzn2023.0.6 amazonlinux 21 k 2025-05-07T19:42:52.0182636Z perl-File-Basename noarch 2.85-477.amzn2023.0.6 amazonlinux 18 k 2025-05-07T19:42:52.0183280Z perl-File-Find noarch 1.37-477.amzn2023.0.6 amazonlinux 26 k 2025-05-07T19:42:52.0184044Z perl-File-Path noarch 2.18-2.amzn2023.0.2 amazonlinux 36 k 2025-05-07T19:42:52.0184648Z perl-File-Temp noarch 1:0.231.100-2.amzn2023.0.2 amazonlinux 60 k 2025-05-07T19:42:52.0185352Z perl-File-stat noarch 1.09-477.amzn2023.0.6 amazonlinux 17 k 2025-05-07T19:42:52.0185983Z perl-FileHandle noarch 2.03-477.amzn2023.0.6 amazonlinux 16 k 2025-05-07T19:42:52.0186594Z perl-Getopt-Long noarch 1:2.52-2.amzn2023.0.2 amazonlinux 60 k 2025-05-07T19:42:52.0187231Z perl-Getopt-Std noarch 1.12-477.amzn2023.0.6 amazonlinux 16 k 2025-05-07T19:42:52.0187817Z perl-Git noarch 2.47.1-1.amzn2023.0.2 amazonlinux 42 k 2025-05-07T19:42:52.0188430Z perl-HTTP-Tiny noarch 0.078-1.amzn2023.0.3 amazonlinux 56 k 2025-05-07T19:42:52.0188996Z perl-IO x86_64 1.43-477.amzn2023.0.6 amazonlinux 87 k 2025-05-07T19:42:52.0189589Z perl-IPC-Open3 noarch 1.21-477.amzn2023.0.6 amazonlinux 23 k 2025-05-07T19:42:52.0190215Z perl-MIME-Base64 x86_64 3.16-2.amzn2023.0.2 amazonlinux 31 k 2025-05-07T19:42:52.0190807Z perl-Net-SSLeay x86_64 1.94-1.amzn2023.0.1 amazonlinux 392 k 2025-05-07T19:42:52.0191396Z perl-POSIX x86_64 1.94-477.amzn2023.0.6 amazonlinux 97 k 2025-05-07T19:42:52.0191970Z perl-PathTools x86_64 3.78-459.amzn2023.0.2 amazonlinux 85 k 2025-05-07T19:42:52.0192605Z perl-Pod-Escapes noarch 1:1.07-458.amzn2023.0.2 amazonlinux 20 k 2025-05-07T19:42:52.0193254Z perl-Pod-Perldoc noarch 3.28.01-459.amzn2023.0.3 amazonlinux 84 k 2025-05-07T19:42:52.0193868Z perl-Pod-Simple noarch 1:3.42-2.amzn2023.0.2 amazonlinux 215 k 2025-05-07T19:42:52.0194501Z perl-Pod-Usage noarch 4:2.01-2.amzn2023.0.2 amazonlinux 41 k 2025-05-07T19:42:52.0195116Z perl-Scalar-List-Utils x86_64 4:1.56-459.amzn2023.0.2 amazonlinux 71 k 2025-05-07T19:42:52.0195867Z perl-SelectSaver noarch 1.02-477.amzn2023.0.6 amazonlinux 12 k 2025-05-07T19:42:52.0196414Z perl-Socket x86_64 4:2.032-1.amzn2023.0.2 amazonlinux 55 k 2025-05-07T19:42:52.0196968Z perl-Storable x86_64 1:3.21-458.amzn2023.0.2 amazonlinux 96 k 2025-05-07T19:42:52.0197537Z perl-Symbol noarch 1.08-477.amzn2023.0.6 amazonlinux 15 k 2025-05-07T19:42:52.0198105Z perl-Term-ANSIColor noarch 5.01-459.amzn2023.0.2 amazonlinux 48 k 2025-05-07T19:42:52.0198701Z perl-Term-Cap noarch 1.17-458.amzn2023.0.2 amazonlinux 22 k 2025-05-07T19:42:52.0199252Z perl-TermReadKey x86_64 2.38-9.amzn2023.0.2 amazonlinux 36 k 2025-05-07T19:42:52.0199924Z perl-Text-ParseWords noarch 3.30-458.amzn2023.0.2 amazonlinux 17 k 2025-05-07T19:42:52.0200736Z perl-Text-Tabs+Wrap noarch 2021.0726-1.amzn2023.0.1 amazonlinux 22 k 2025-05-07T19:42:52.0201340Z perl-Time-Local noarch 2:1.300-5.amzn2023.0.2 amazonlinux 34 k 2025-05-07T19:42:52.0201918Z perl-URI noarch 5.09-1.amzn2023.0.2 amazonlinux 108 k 2025-05-07T19:42:52.0202467Z perl-base noarch 2.27-477.amzn2023.0.6 amazonlinux 17 k 2025-05-07T19:42:52.0203055Z perl-constant noarch 1.33-459.amzn2023.0.2 amazonlinux 23 k 2025-05-07T19:42:52.0203643Z perl-if noarch 0.60.800-477.amzn2023.0.6 amazonlinux 14 k 2025-05-07T19:42:52.0204189Z perl-interpreter x86_64 4:5.32.1-477.amzn2023.0.6 amazonlinux 71 k 2025-05-07T19:42:52.0204767Z perl-lib x86_64 0.65-477.amzn2023.0.6 amazonlinux 15 k 2025-05-07T19:42:52.0205308Z perl-libnet noarch 3.13-2.amzn2023.0.2 amazonlinux 126 k 2025-05-07T19:42:52.0205876Z perl-libs x86_64 4:5.32.1-477.amzn2023.0.6 amazonlinux 2.0 M 2025-05-07T19:42:52.0206444Z perl-mro x86_64 1.23-477.amzn2023.0.6 amazonlinux 29 k 2025-05-07T19:42:52.0207036Z perl-overload noarch 1.31-477.amzn2023.0.6 amazonlinux 46 k 2025-05-07T19:42:52.0207639Z perl-overloading noarch 0.02-477.amzn2023.0.6 amazonlinux 13 k 2025-05-07T19:42:52.0208208Z perl-parent noarch 1:0.238-458.amzn2023.0.2 amazonlinux 14 k 2025-05-07T19:42:52.0208946Z perl-podlators noarch 1:4.14-458.amzn2023.0.2 amazonlinux 112 k 2025-05-07T19:42:52.0209719Z perl-subs noarch 1.03-477.amzn2023.0.6 amazonlinux 12 k 2025-05-07T19:42:52.0210273Z perl-vars noarch 1.05-477.amzn2023.0.6 amazonlinux 13 k 2025-05-07T19:42:52.0210864Z shadow-utils x86_64 2:4.9-12.amzn2023.0.4 amazonlinux 1.1 M 2025-05-07T19:42:52.0211423Z systemd-libs x86_64 252.23-3.amzn2023 amazonlinux 613 k 2025-05-07T19:42:52.0211999Z util-linux x86_64 2.37.4-1.amzn2023.0.4 amazonlinux 2.2 M 2025-05-07T19:42:52.0212582Z util-linux-core x86_64 2.37.4-1.amzn2023.0.4 amazonlinux 432 k 2025-05-07T19:42:52.0213041Z Installing weak dependencies: 2025-05-07T19:42:52.0213536Z nano-default-editor noarch 8.3-1.amzn2023 amazonlinux 10 k 2025-05-07T19:42:52.0214154Z perl-IO-Socket-IP noarch 0.41-3.amzn2023.0.2 amazonlinux 42 k 2025-05-07T19:42:52.0214792Z perl-IO-Socket-SSL noarch 2.075-1.amzn2023.0.2 amazonlinux 218 k 2025-05-07T19:42:52.0215400Z perl-Mozilla-CA noarch 20200520-4.amzn2023.0.2 amazonlinux 13 k 2025-05-07T19:42:52.0216006Z perl-NDBM_File x86_64 1.15-477.amzn2023.0.6 amazonlinux 23 k 2025-05-07T19:42:52.0216618Z sudo-python-plugin x86_64 1.9.15-1.p5.amzn2023.0.1 amazonlinux 56 k 2025-05-07T19:42:52.0216975Z 2025-05-07T19:42:52.0217085Z Transaction Summary 2025-05-07T19:42:52.0217416Z ======================================================================================== 2025-05-07T19:42:52.0217770Z Install 107 Packages 2025-05-07T19:42:52.0217956Z 2025-05-07T19:42:52.0218076Z Total download size: 38 M 2025-05-07T19:42:52.0218377Z Installed size: 151 M 2025-05-07T19:42:52.0218637Z Downloading Packages: 2025-05-07T19:42:52.3070621Z (1/107): cracklib-2.9.6-27.amzn2023.0.2.x86_64. 4.1 MB/s | 82 kB 00:00 2025-05-07T19:42:52.3197332Z (2/107): cyrus-sasl-lib-2.1.27-18.amzn2023.0.3. 24 MB/s | 786 kB 00:00 2025-05-07T19:42:52.3227170Z (3/107): elfutils-debuginfod-client-0.188-3.amz 2.7 MB/s | 41 kB 00:00 2025-05-07T19:42:52.3563464Z (4/107): binutils-2.41-50.amzn2023.0.3.x86_64.r 76 MB/s | 5.3 MB 00:00 2025-05-07T19:42:52.3611983Z (5/107): findutils-4.8.0-2.amzn2023.0.2.x86_64. 13 MB/s | 539 kB 00:00 2025-05-07T19:42:52.3621692Z (6/107): git-2.47.1-1.amzn2023.0.2.x86_64.rpm 1.4 MB/s | 54 kB 00:00 2025-05-07T19:42:52.3825672Z (7/107): gnutls-3.8.3-6.amzn2023.0.1.x86_64.rpm 56 MB/s | 1.1 MB 00:00 2025-05-07T19:42:52.3986481Z (8/107): git-core-doc-2.47.1-1.amzn2023.0.2.noa 78 MB/s | 2.8 MB 00:00 2025-05-07T19:42:52.4081937Z (9/107): groff-base-1.22.4-7.amzn2023.0.2.x86_6 46 MB/s | 1.0 MB 00:00 2025-05-07T19:42:52.4272761Z (10/107): git-core-2.47.1-1.amzn2023.0.2.x86_64 67 MB/s | 4.7 MB 00:00 2025-05-07T19:42:52.4322215Z (11/107): gzip-1.12-1.amzn2023.0.1.x86_64.rpm 5.3 MB/s | 160 kB 00:00 2025-05-07T19:42:52.4403281Z (12/107): hwdata-0.384-1.amzn2023.0.3.noarch.rp 51 MB/s | 1.6 MB 00:00 2025-05-07T19:42:52.4418124Z (13/107): jansson-2.14-0.amzn2023.x86_64.rpm 3.8 MB/s | 46 kB 00:00 2025-05-07T19:42:52.4436169Z (14/107): kmod-libs-29-2.amzn2023.0.5.x86_64.rp 5.6 MB/s | 62 kB 00:00 2025-05-07T19:42:52.4474435Z (15/107): less-608-2.amzn2023.0.2.x86_64.rpm 24 MB/s | 168 kB 00:00 2025-05-07T19:42:52.4501589Z (16/107): libcbor-0.7.0-3.amzn2023.0.2.x86_64.r 9.6 MB/s | 57 kB 00:00 2025-05-07T19:42:52.4565235Z (17/107): libdb-5.3.28-49.amzn2023.0.2.x86_64.r 61 MB/s | 756 kB 00:00 2025-05-07T19:42:52.4587068Z (18/107): libeconf-0.4.0-1.amzn2023.0.3.x86_64. 2.5 MB/s | 28 kB 00:00 2025-05-07T19:42:52.4610328Z (19/107): libedit-3.1-38.20210714cvs.amzn2023.0 12 MB/s | 108 kB 00:00 2025-05-07T19:42:52.4643223Z (20/107): libfdisk-2.37.4-1.amzn2023.0.4.x86_64 21 MB/s | 153 kB 00:00 2025-05-07T19:42:52.4689125Z (21/107): libfido2-1.10.0-2.amzn2023.0.2.x86_64 14 MB/s | 95 kB 00:00 2025-05-07T19:42:52.4696073Z (22/107): libmetalink-0.1.3-14.amzn2023.0.2.x86 4.0 MB/s | 31 kB 00:00 2025-05-07T19:42:52.4719532Z (23/107): libpwquality-1.4.4-6.amzn2023.0.2.x86 14 MB/s | 106 kB 00:00 2025-05-07T19:42:52.4772227Z (24/107): libsemanage-3.4-5.amzn2023.0.2.x86_64 18 MB/s | 121 kB 00:00 2025-05-07T19:42:52.4780394Z (25/107): libutempter-1.2.1-4.amzn2023.0.2.x86_ 3.3 MB/s | 26 kB 00:00 2025-05-07T19:42:52.4839826Z (26/107): nano-8.3-1.amzn2023.x86_64.rpm 62 MB/s | 706 kB 00:00 2025-05-07T19:42:52.4854178Z (27/107): nano-default-editor-8.3-1.amzn2023.no 1.8 MB/s | 10 kB 00:00 2025-05-07T19:42:52.4897616Z (28/107): ncurses-6.2-4.20200222.amzn2023.0.6.x 38 MB/s | 394 kB 00:00 2025-05-07T19:42:52.4947547Z (29/107): nettle-3.10.1-1.amzn2023.0.1.x86_64.r 53 MB/s | 573 kB 00:00 2025-05-07T19:42:52.4985146Z (30/107): openldap-2.4.57-6.amzn2023.0.7.x86_64 21 MB/s | 256 kB 00:00 2025-05-07T19:42:52.5018806Z (31/107): openssh-8.7p1-8.amzn2023.0.14.x86_64. 39 MB/s | 454 kB 00:00 2025-05-07T19:42:52.5082389Z (32/107): pam-1.5.1-8.amzn2023.0.4.x86_64.rpm 59 MB/s | 542 kB 00:00 2025-05-07T19:42:52.5135092Z (33/107): openssh-clients-8.7p1-8.amzn2023.0.14 39 MB/s | 708 kB 00:00 2025-05-07T19:42:52.5153403Z (34/107): pciutils-3.7.0-3.amzn2023.0.2.x86_64. 7.0 MB/s | 93 kB 00:00 2025-05-07T19:42:52.5171393Z (35/107): pciutils-libs-3.7.0-3.amzn2023.0.2.x8 5.0 MB/s | 41 kB 00:00 2025-05-07T19:42:52.5186021Z (36/107): perl-AutoLoader-5.74-477.amzn2023.0.6 4.5 MB/s | 22 kB 00:00 2025-05-07T19:42:52.5218359Z (37/107): perl-Carp-1.50-458.amzn2023.0.2.noarc 6.4 MB/s | 29 kB 00:00 2025-05-07T19:42:52.5256760Z (38/107): perl-B-1.80-477.amzn2023.0.6.x86_64.r 22 MB/s | 179 kB 00:00 2025-05-07T19:42:52.5273492Z (39/107): perl-Class-Struct-0.66-477.amzn2023.0 2.6 MB/s | 22 kB 00:00 2025-05-07T19:42:52.5290693Z (40/107): perl-Data-Dumper-2.174-460.amzn2023.0 8.3 MB/s | 55 kB 00:00 2025-05-07T19:42:52.5314053Z (41/107): perl-Digest-1.20-1.amzn2023.0.2.noarc 4.8 MB/s | 26 kB 00:00 2025-05-07T19:42:52.5345003Z (42/107): perl-Digest-MD5-2.58-2.amzn2023.0.2.x 8.1 MB/s | 36 kB 00:00 2025-05-07T19:42:52.5360633Z (43/107): perl-DynaLoader-1.47-477.amzn2023.0.6 4.3 MB/s | 26 kB 00:00 2025-05-07T19:42:52.5491820Z (44/107): perl-Encode-3.15-462.amzn2023.0.2.x86 97 MB/s | 1.7 MB 00:00 2025-05-07T19:42:52.5506070Z (45/107): perl-Errno-1.30-477.amzn2023.0.6.x86_ 965 kB/s | 15 kB 00:00 2025-05-07T19:42:52.5518641Z (46/107): perl-Error-0.17029-5.amzn2023.0.2.noa 2.6 MB/s | 41 kB 00:00 2025-05-07T19:42:52.5545816Z (47/107): perl-Exporter-5.74-459.amzn2023.0.2.n 6.3 MB/s | 31 kB 00:00 2025-05-07T19:42:52.5584870Z (48/107): perl-Fcntl-1.13-477.amzn2023.0.6.x86_ 3.5 MB/s | 21 kB 00:00 2025-05-07T19:42:52.5597104Z (49/107): perl-File-Basename-2.85-477.amzn2023. 2.4 MB/s | 18 kB 00:00 2025-05-07T19:42:52.5630812Z (50/107): perl-File-Path-2.18-2.amzn2023.0.2.no 7.9 MB/s | 36 kB 00:00 2025-05-07T19:42:52.5654868Z (51/107): perl-File-Find-1.37-477.amzn2023.0.6. 2.4 MB/s | 26 kB 00:00 2025-05-07T19:42:52.5666868Z (52/107): perl-File-Temp-0.231.100-2.amzn2023.0 8.6 MB/s | 60 kB 00:00 2025-05-07T19:42:52.5687225Z (53/107): perl-File-stat-1.09-477.amzn2023.0.6. 3.8 MB/s | 17 kB 00:00 2025-05-07T19:42:52.5734502Z (54/107): perl-FileHandle-2.03-477.amzn2023.0.6 2.7 MB/s | 16 kB 00:00 2025-05-07T19:42:52.5747024Z (55/107): perl-Getopt-Long-2.52-2.amzn2023.0.2. 7.9 MB/s | 60 kB 00:00 2025-05-07T19:42:52.5765033Z (56/107): perl-Getopt-Std-1.12-477.amzn2023.0.6 2.1 MB/s | 16 kB 00:00 2025-05-07T19:42:52.5805455Z (57/107): perl-HTTP-Tiny-0.078-1.amzn2023.0.3.n 11 MB/s | 56 kB 00:00 2025-05-07T19:42:52.5825980Z (58/107): perl-Git-2.47.1-1.amzn2023.0.2.noarch 5.9 MB/s | 42 kB 00:00 2025-05-07T19:42:52.5846161Z (59/107): perl-IO-1.43-477.amzn2023.0.6.x86_64. 11 MB/s | 87 kB 00:00 2025-05-07T19:42:52.5882768Z (60/107): perl-IO-Socket-IP-0.41-3.amzn2023.0.2 5.8 MB/s | 42 kB 00:00 2025-05-07T19:42:52.5915361Z (61/107): perl-IPC-Open3-1.21-477.amzn2023.0.6. 3.7 MB/s | 23 kB 00:00 2025-05-07T19:42:52.5951313Z (62/107): perl-IO-Socket-SSL-2.075-1.amzn2023.0 22 MB/s | 218 kB 00:00 2025-05-07T19:42:52.5967839Z (63/107): perl-MIME-Base64-3.16-2.amzn2023.0.2. 3.7 MB/s | 31 kB 00:00 2025-05-07T19:42:52.5985904Z (64/107): perl-Mozilla-CA-20200520-4.amzn2023.0 2.1 MB/s | 13 kB 00:00 2025-05-07T19:42:52.6027717Z (65/107): perl-NDBM_File-1.15-477.amzn2023.0.6. 4.0 MB/s | 23 kB 00:00 2025-05-07T19:42:52.6074961Z (66/107): perl-Net-SSLeay-1.94-1.amzn2023.0.1.x 38 MB/s | 392 kB 00:00 2025-05-07T19:42:52.6097613Z (67/107): perl-POSIX-1.94-477.amzn2023.0.6.x86_ 8.7 MB/s | 97 kB 00:00 2025-05-07T19:42:52.6114496Z (68/107): perl-PathTools-3.78-459.amzn2023.0.2. 10 MB/s | 85 kB 00:00 2025-05-07T19:42:52.6134713Z (69/107): perl-Pod-Escapes-1.07-458.amzn2023.0. 3.8 MB/s | 20 kB 00:00 2025-05-07T19:42:52.6173748Z (70/107): perl-Pod-Perldoc-3.28.01-459.amzn2023 12 MB/s | 84 kB 00:00 2025-05-07T19:42:52.6206049Z (71/107): perl-Pod-Simple-3.42-2.amzn2023.0.2.n 25 MB/s | 215 kB 00:00 2025-05-07T19:42:52.6218295Z (72/107): perl-Pod-Usage-2.01-2.amzn2023.0.2.no 4.9 MB/s | 41 kB 00:00 2025-05-07T19:42:52.6249399Z (73/107): perl-Scalar-List-Utils-1.56-459.amzn2 10 MB/s | 71 kB 00:00 2025-05-07T19:42:52.6286103Z (74/107): perl-Socket-2.032-1.amzn2023.0.2.x86_ 10 MB/s | 55 kB 00:00 2025-05-07T19:42:52.6302502Z (75/107): perl-SelectSaver-1.02-477.amzn2023.0. 1.7 MB/s | 12 kB 00:00 2025-05-07T19:42:52.6328491Z (76/107): perl-Storable-3.21-458.amzn2023.0.2.x 12 MB/s | 96 kB 00:00 2025-05-07T19:42:52.6345659Z (77/107): perl-Symbol-1.08-477.amzn2023.0.6.noa 2.7 MB/s | 15 kB 00:00 2025-05-07T19:42:52.6386430Z (78/107): perl-Term-Cap-1.17-458.amzn2023.0.2.n 4.2 MB/s | 22 kB 00:00 2025-05-07T19:42:52.6404793Z (79/107): perl-Term-ANSIColor-5.01-459.amzn2023 6.7 MB/s | 48 kB 00:00 2025-05-07T19:42:52.6426772Z (80/107): perl-TermReadKey-2.38-9.amzn2023.0.2. 4.7 MB/s | 36 kB 00:00 2025-05-07T19:42:52.6444139Z (81/107): perl-Text-ParseWords-3.30-458.amzn202 3.1 MB/s | 17 kB 00:00 2025-05-07T19:42:52.6466001Z (82/107): perl-Text-Tabs+Wrap-2021.0726-1.amzn2 3.9 MB/s | 22 kB 00:00 2025-05-07T19:42:52.6507933Z (83/107): perl-URI-5.09-1.amzn2023.0.2.noarch.r 18 MB/s | 108 kB 00:00 2025-05-07T19:42:52.6535542Z (84/107): perl-Time-Local-1.300-5.amzn2023.0.2. 3.3 MB/s | 34 kB 00:00 2025-05-07T19:42:52.6546390Z (85/107): perl-base-2.27-477.amzn2023.0.6.noarc 2.1 MB/s | 17 kB 00:00 2025-05-07T19:42:52.6572759Z (86/107): perl-constant-1.33-459.amzn2023.0.2.n 4.3 MB/s | 23 kB 00:00 2025-05-07T19:42:52.6600452Z (87/107): perl-if-0.60.800-477.amzn2023.0.6.noa 3.3 MB/s | 14 kB 00:00 2025-05-07T19:42:52.6618702Z (88/107): perl-interpreter-5.32.1-477.amzn2023. 11 MB/s | 71 kB 00:00 2025-05-07T19:42:52.6631075Z (89/107): perl-lib-0.65-477.amzn2023.0.6.x86_64 2.5 MB/s | 15 kB 00:00 2025-05-07T19:42:52.6664902Z (90/107): perl-libnet-3.13-2.amzn2023.0.2.noarc 21 MB/s | 126 kB 00:00 2025-05-07T19:42:52.6730178Z (91/107): perl-mro-1.23-477.amzn2023.0.6.x86_64 3.1 MB/s | 29 kB 00:00 2025-05-07T19:42:52.6831034Z (92/107): perl-libs-5.32.1-477.amzn2023.0.6.x86 98 MB/s | 2.0 MB 00:00 2025-05-07T19:42:52.6852714Z (93/107): perl-overload-1.31-477.amzn2023.0.6.n 2.5 MB/s | 46 kB 00:00 2025-05-07T19:42:52.6859801Z (94/107): perl-overloading-0.02-477.amzn2023.0. 1.0 MB/s | 13 kB 00:00 2025-05-07T19:42:52.6881299Z (95/107): perl-parent-0.238-458.amzn2023.0.2.no 3.2 MB/s | 14 kB 00:00 2025-05-07T19:42:52.6916871Z (96/107): perl-podlators-4.14-458.amzn2023.0.2. 23 MB/s | 112 kB 00:00 2025-05-07T19:42:52.6936879Z (97/107): perl-subs-1.03-477.amzn2023.0.6.noarc 1.8 MB/s | 12 kB 00:00 2025-05-07T19:42:52.6947147Z (98/107): perl-vars-1.05-477.amzn2023.0.6.noarc 2.0 MB/s | 13 kB 00:00 2025-05-07T19:42:52.7036285Z (99/107): shadow-utils-4.9-12.amzn2023.0.4.x86_ 97 MB/s | 1.1 MB 00:00 2025-05-07T19:42:52.7119641Z (100/107): sudo-1.9.15-1.p5.amzn2023.0.1.x86_64 76 MB/s | 1.3 MB 00:00 2025-05-07T19:42:52.7135754Z (101/107): sudo-python-plugin-1.9.15-1.p5.amzn2 3.0 MB/s | 56 kB 00:00 2025-05-07T19:42:52.7186383Z (102/107): systemd-libs-252.23-3.amzn2023.x86_6 46 MB/s | 613 kB 00:00 2025-05-07T19:42:52.7306743Z (103/107): tar-1.34-1.amzn2023.0.4.x86_64.rpm 53 MB/s | 879 kB 00:00 2025-05-07T19:42:52.7411612Z (104/107): util-linux-2.37.4-1.amzn2023.0.4.x86 82 MB/s | 2.2 MB 00:00 2025-05-07T19:42:52.7451142Z (105/107): util-linux-core-2.37.4-1.amzn2023.0. 17 MB/s | 432 kB 00:00 2025-05-07T19:42:52.7512425Z (106/107): wget-1.21.3-1.amzn2023.0.4.x86_64.rp 39 MB/s | 779 kB 00:00 2025-05-07T19:42:52.7530542Z (107/107): which-2.21-26.amzn2023.0.2.x86_64.rp 6.9 MB/s | 42 kB 00:00 2025-05-07T19:42:52.7547868Z -------------------------------------------------------------------------------- 2025-05-07T19:42:52.7548393Z Total 50 MB/s | 38 MB 00:00 2025-05-07T19:42:53.8020187Z Running transaction check 2025-05-07T19:42:53.8482660Z Transaction check succeeded. 2025-05-07T19:42:53.8483104Z Running transaction test 2025-05-07T19:42:54.2146793Z Transaction test succeeded. 2025-05-07T19:42:54.2147641Z Running transaction 2025-05-07T19:42:54.8967824Z Preparing : 1/1 2025-05-07T19:42:54.9108900Z Installing : systemd-libs-252.23-3.amzn2023.x86_64 1/107 2025-05-07T19:42:54.9361930Z Installing : nettle-3.10.1-1.amzn2023.0.1.x86_64 2/107 2025-05-07T19:42:54.9567733Z Installing : gnutls-3.8.3-6.amzn2023.0.1.x86_64 3/107 2025-05-07T19:42:54.9626674Z Installing : util-linux-core-2.37.4-1.amzn2023.0.4.x86_64 4/107 2025-05-07T19:42:54.9694489Z Running scriptlet: util-linux-core-2.37.4-1.amzn2023.0.4.x86_64 4/107 2025-05-07T19:42:54.9789184Z Installing : pciutils-libs-3.7.0-3.amzn2023.0.2.x86_64 5/107 2025-05-07T19:42:55.0076584Z Installing : ncurses-6.2-4.20200222.amzn2023.0.6.x86_64 6/107 2025-05-07T19:42:55.0148176Z Installing : nano-8.3-1.amzn2023.x86_64 7/107 2025-05-07T19:42:55.0219459Z Installing : nano-default-editor-8.3-1.amzn2023.noarch 8/107 2025-05-07T19:42:55.0743771Z Installing : libsemanage-3.4-5.amzn2023.0.2.x86_64 9/107 2025-05-07T19:42:55.0832935Z Installing : shadow-utils-2:4.9-12.amzn2023.0.4.x86_64 10/107 2025-05-07T19:42:55.1262934Z Running scriptlet: libutempter-1.2.1-4.amzn2023.0.2.x86_64 11/107 2025-05-07T19:42:55.1329905Z Installing : libutempter-1.2.1-4.amzn2023.0.2.x86_64 11/107 2025-05-07T19:42:55.1399687Z Installing : libmetalink-0.1.3-14.amzn2023.0.2.x86_64 12/107 2025-05-07T19:42:55.1468583Z Installing : libfdisk-2.37.4-1.amzn2023.0.4.x86_64 13/107 2025-05-07T19:42:55.1535127Z Installing : libedit-3.1-38.20210714cvs.amzn2023.0.2.x86_64 14/107 2025-05-07T19:42:55.1683944Z Installing : libeconf-0.4.0-1.amzn2023.0.3.x86_64 15/107 2025-05-07T19:42:55.1745605Z Installing : libdb-5.3.28-49.amzn2023.0.2.x86_64 16/107 2025-05-07T19:42:55.1811313Z Installing : libcbor-0.7.0-3.amzn2023.0.2.x86_64 17/107 2025-05-07T19:42:55.1890431Z Installing : libfido2-1.10.0-2.amzn2023.0.2.x86_64 18/107 2025-05-07T19:42:55.1957346Z Installing : less-608-2.amzn2023.0.2.x86_64 19/107 2025-05-07T19:42:55.2015557Z Installing : kmod-libs-29-2.amzn2023.0.5.x86_64 20/107 2025-05-07T19:42:55.2450994Z Installing : jansson-2.14-0.amzn2023.x86_64 21/107 2025-05-07T19:42:55.2539791Z Installing : hwdata-0.384-1.amzn2023.0.3.noarch 22/107 2025-05-07T19:42:55.2705841Z Installing : gzip-1.12-1.amzn2023.0.1.x86_64 23/107 2025-05-07T19:42:55.3170593Z Installing : cracklib-2.9.6-27.amzn2023.0.2.x86_64 24/107 2025-05-07T19:42:55.3368303Z Installing : pam-1.5.1-8.amzn2023.0.4.x86_64 25/107 2025-05-07T19:42:55.4204307Z Installing : libpwquality-1.4.4-6.amzn2023.0.2.x86_64 26/107 2025-05-07T19:42:55.4205982Z Installing : util-linux-2.37.4-1.amzn2023.0.4.x86_64 27/107 2025-05-07T19:42:55.4207294Z warning: /etc/adjtime created as /etc/adjtime.rpmnew 2025-05-07T19:42:55.4207547Z 2025-05-07T19:42:55.4421474Z Running scriptlet: util-linux-2.37.4-1.amzn2023.0.4.x86_64 27/107 2025-05-07T19:42:55.4762510Z Running scriptlet: openssh-8.7p1-8.amzn2023.0.14.x86_64 28/107 2025-05-07T19:42:55.4958960Z Installing : openssh-8.7p1-8.amzn2023.0.14.x86_64 28/107 2025-05-07T19:42:55.5029023Z Installing : openssh-clients-8.7p1-8.amzn2023.0.14.x86_64 29/107 2025-05-07T19:42:55.6138019Z Running scriptlet: openssh-clients-8.7p1-8.amzn2023.0.14.x86_64 29/107 2025-05-07T19:42:55.7653981Z Installing : git-core-2.47.1-1.amzn2023.0.2.x86_64 30/107 2025-05-07T19:42:55.7786064Z Installing : git-core-doc-2.47.1-1.amzn2023.0.2.noarch 31/107 2025-05-07T19:42:55.8198532Z Running scriptlet: groff-base-1.22.4-7.amzn2023.0.2.x86_64 32/107 2025-05-07T19:42:55.8286680Z Installing : groff-base-1.22.4-7.amzn2023.0.2.x86_64 32/107 2025-05-07T19:42:55.8362382Z Running scriptlet: groff-base-1.22.4-7.amzn2023.0.2.x86_64 32/107 2025-05-07T19:42:55.8434605Z Installing : perl-Digest-1.20-1.amzn2023.0.2.noarch 33/107 2025-05-07T19:42:55.8525895Z Installing : perl-Digest-MD5-2.58-2.amzn2023.0.2.x86_64 34/107 2025-05-07T19:42:55.8584873Z Installing : perl-B-1.80-477.amzn2023.0.6.x86_64 35/107 2025-05-07T19:42:55.8635640Z Installing : perl-FileHandle-2.03-477.amzn2023.0.6.noarch 36/107 2025-05-07T19:42:55.8693009Z Installing : perl-AutoLoader-5.74-477.amzn2023.0.6.noarch 37/107 2025-05-07T19:42:55.8781799Z Installing : perl-Data-Dumper-2.174-460.amzn2023.0.2.x86_64 38/107 2025-05-07T19:42:55.8845329Z Installing : perl-libnet-3.13-2.amzn2023.0.2.noarch 39/107 2025-05-07T19:42:55.8942714Z Installing : perl-base-2.27-477.amzn2023.0.6.noarch 40/107 2025-05-07T19:42:55.9153051Z Installing : perl-URI-5.09-1.amzn2023.0.2.noarch 41/107 2025-05-07T19:42:55.9244814Z Installing : perl-Net-SSLeay-1.94-1.amzn2023.0.1.x86_64 42/107 2025-05-07T19:42:55.9298516Z Installing : perl-Text-Tabs+Wrap-2021.0726-1.amzn2023.0.1.noa 43/107 2025-05-07T19:42:55.9345812Z Installing : perl-Mozilla-CA-20200520-4.amzn2023.0.2.noarch 44/107 2025-05-07T19:42:55.9407571Z Installing : perl-if-0.60.800-477.amzn2023.0.6.noarch 45/107 2025-05-07T19:42:55.9465742Z Installing : perl-IO-Socket-IP-0.41-3.amzn2023.0.2.noarch 46/107 2025-05-07T19:42:55.9521967Z Installing : perl-Time-Local-2:1.300-5.amzn2023.0.2.noarch 47/107 2025-05-07T19:42:55.9613166Z Installing : perl-File-Path-2.18-2.amzn2023.0.2.noarch 48/107 2025-05-07T19:42:55.9682443Z Installing : perl-IO-Socket-SSL-2.075-1.amzn2023.0.2.noarch 49/107 2025-05-07T19:42:55.9728093Z Installing : perl-Pod-Escapes-1:1.07-458.amzn2023.0.2.noarch 50/107 2025-05-07T19:42:55.9787527Z Installing : perl-Class-Struct-0.66-477.amzn2023.0.6.noarch 51/107 2025-05-07T19:42:55.9849257Z Installing : perl-POSIX-1.94-477.amzn2023.0.6.x86_64 52/107 2025-05-07T19:42:55.9906448Z Installing : perl-Term-ANSIColor-5.01-459.amzn2023.0.2.noarch 53/107 2025-05-07T19:42:55.9953696Z Installing : perl-IPC-Open3-1.21-477.amzn2023.0.6.noarch 54/107 2025-05-07T19:42:56.0010032Z Installing : perl-subs-1.03-477.amzn2023.0.6.noarch 55/107 2025-05-07T19:42:56.0072117Z Installing : perl-File-Temp-1:0.231.100-2.amzn2023.0.2.noarch 56/107 2025-05-07T19:42:56.0134335Z Installing : perl-HTTP-Tiny-0.078-1.amzn2023.0.3.noarch 57/107 2025-05-07T19:42:56.0245843Z Installing : perl-Term-Cap-1.17-458.amzn2023.0.2.noarch 58/107 2025-05-07T19:42:56.0331381Z Installing : perl-Pod-Simple-1:3.42-2.amzn2023.0.2.noarch 59/107 2025-05-07T19:42:56.0386896Z Installing : perl-Socket-4:2.032-1.amzn2023.0.2.x86_64 60/107 2025-05-07T19:42:56.0436901Z Installing : perl-SelectSaver-1.02-477.amzn2023.0.6.noarch 61/107 2025-05-07T19:42:56.0485738Z Installing : perl-Symbol-1.08-477.amzn2023.0.6.noarch 62/107 2025-05-07T19:42:56.0565355Z Installing : perl-File-stat-1.09-477.amzn2023.0.6.noarch 63/107 2025-05-07T19:42:56.0665735Z Installing : perl-podlators-1:4.14-458.amzn2023.0.2.noarch 64/107 2025-05-07T19:42:56.0740719Z Installing : perl-Pod-Perldoc-3.28.01-459.amzn2023.0.3.noarch 65/107 2025-05-07T19:42:56.0801738Z Installing : perl-Fcntl-1.13-477.amzn2023.0.6.x86_64 66/107 2025-05-07T19:42:56.0855225Z Installing : perl-Text-ParseWords-3.30-458.amzn2023.0.2.noarc 67/107 2025-05-07T19:42:56.0933160Z Installing : perl-mro-1.23-477.amzn2023.0.6.x86_64 68/107 2025-05-07T19:42:56.0995276Z Installing : perl-IO-1.43-477.amzn2023.0.6.x86_64 69/107 2025-05-07T19:42:56.1055813Z Installing : perl-overloading-0.02-477.amzn2023.0.6.noarch 70/107 2025-05-07T19:42:56.1128942Z Installing : perl-Pod-Usage-4:2.01-2.amzn2023.0.2.noarch 71/107 2025-05-07T19:42:56.1176155Z Installing : perl-Errno-1.30-477.amzn2023.0.6.x86_64 72/107 2025-05-07T19:42:56.1222767Z Installing : perl-File-Basename-2.85-477.amzn2023.0.6.noarch 73/107 2025-05-07T19:42:56.1286411Z Installing : perl-Getopt-Std-1.12-477.amzn2023.0.6.noarch 74/107 2025-05-07T19:42:56.1366196Z Installing : perl-MIME-Base64-3.16-2.amzn2023.0.2.x86_64 75/107 2025-05-07T19:42:56.1444907Z Installing : perl-Scalar-List-Utils-4:1.56-459.amzn2023.0.2.x 76/107 2025-05-07T19:42:56.1508246Z Installing : perl-constant-1.33-459.amzn2023.0.2.noarch 77/107 2025-05-07T19:42:56.1579595Z Installing : perl-Storable-1:3.21-458.amzn2023.0.2.x86_64 78/107 2025-05-07T19:42:56.1627056Z Installing : perl-overload-1.31-477.amzn2023.0.6.noarch 79/107 2025-05-07T19:42:56.1683247Z Installing : perl-parent-1:0.238-458.amzn2023.0.2.noarch 80/107 2025-05-07T19:42:56.1746317Z Installing : perl-vars-1.05-477.amzn2023.0.6.noarch 81/107 2025-05-07T19:42:56.1805430Z Installing : perl-Getopt-Long-1:2.52-2.amzn2023.0.2.noarch 82/107 2025-05-07T19:42:56.1861625Z Installing : perl-DynaLoader-1.47-477.amzn2023.0.6.x86_64 83/107 2025-05-07T19:42:56.1911116Z Installing : perl-Carp-1.50-458.amzn2023.0.2.noarch 84/107 2025-05-07T19:42:56.1971510Z Installing : perl-Exporter-5.74-459.amzn2023.0.2.noarch 85/107 2025-05-07T19:42:56.2053472Z Installing : perl-NDBM_File-1.15-477.amzn2023.0.6.x86_64 86/107 2025-05-07T19:42:56.2585603Z Installing : perl-PathTools-3.78-459.amzn2023.0.2.x86_64 87/107 2025-05-07T19:42:56.3553400Z Installing : perl-Encode-4:3.15-462.amzn2023.0.2.x86_64 88/107 2025-05-07T19:42:56.3682976Z Installing : perl-libs-4:5.32.1-477.amzn2023.0.6.x86_64 89/107 2025-05-07T19:42:56.3765230Z Installing : perl-interpreter-4:5.32.1-477.amzn2023.0.6.x86_6 90/107 2025-05-07T19:42:56.3835292Z Installing : perl-Error-1:0.17029-5.amzn2023.0.2.noarch 91/107 2025-05-07T19:42:56.3899716Z Installing : perl-File-Find-1.37-477.amzn2023.0.6.noarch 92/107 2025-05-07T19:42:56.3977146Z Installing : perl-TermReadKey-2.38-9.amzn2023.0.2.x86_64 93/107 2025-05-07T19:42:56.4027737Z Installing : perl-lib-0.65-477.amzn2023.0.6.x86_64 94/107 2025-05-07T19:42:56.4097030Z Installing : perl-Git-2.47.1-1.amzn2023.0.2.noarch 95/107 2025-05-07T19:42:56.4174863Z Installing : git-2.47.1-1.amzn2023.0.2.x86_64 96/107 2025-05-07T19:42:56.4374578Z Installing : elfutils-debuginfod-client-0.188-3.amzn2023.0.2. 97/107 2025-05-07T19:42:56.4501651Z Installing : cyrus-sasl-lib-2.1.27-18.amzn2023.0.3.x86_64 98/107 2025-05-07T19:42:56.4589274Z Installing : openldap-2.4.57-6.amzn2023.0.7.x86_64 99/107 2025-05-07T19:42:56.4991718Z Installing : sudo-python-plugin-1.9.15-1.p5.amzn2023.0.1.x86_ 100/107 2025-05-07T19:42:56.6223775Z Installing : sudo-1.9.15-1.p5.amzn2023.0.1.x86_64 101/107 2025-05-07T19:42:56.6308313Z Installing : binutils-2.41-50.amzn2023.0.3.x86_64 102/107 2025-05-07T19:42:56.6415826Z Running scriptlet: binutils-2.41-50.amzn2023.0.3.x86_64 102/107 2025-05-07T19:42:56.6720860Z Installing : pciutils-3.7.0-3.amzn2023.0.2.x86_64 103/107 2025-05-07T19:42:56.6816916Z Installing : wget-1.21.3-1.amzn2023.0.4.x86_64 104/107 2025-05-07T19:42:56.7067473Z Installing : which-2.21-26.amzn2023.0.2.x86_64 105/107 2025-05-07T19:42:56.7287488Z Installing : tar-2:1.34-1.amzn2023.0.4.x86_64 106/107 2025-05-07T19:42:56.7365197Z Installing : findutils-1:4.8.0-2.amzn2023.0.2.x86_64 107/107 2025-05-07T19:42:56.7475937Z Running scriptlet: pam-1.5.1-8.amzn2023.0.4.x86_64 107/107 2025-05-07T19:42:57.5267123Z Running scriptlet: findutils-1:4.8.0-2.amzn2023.0.2.x86_64 107/107 2025-05-07T19:42:57.5268178Z Verifying : binutils-2.41-50.amzn2023.0.3.x86_64 1/107 2025-05-07T19:42:57.5268843Z Verifying : cracklib-2.9.6-27.amzn2023.0.2.x86_64 2/107 2025-05-07T19:42:57.5269459Z Verifying : cyrus-sasl-lib-2.1.27-18.amzn2023.0.3.x86_64 3/107 2025-05-07T19:42:57.5270170Z Verifying : elfutils-debuginfod-client-0.188-3.amzn2023.0.2. 4/107 2025-05-07T19:42:57.5270833Z Verifying : findutils-1:4.8.0-2.amzn2023.0.2.x86_64 5/107 2025-05-07T19:42:57.5271384Z Verifying : git-2.47.1-1.amzn2023.0.2.x86_64 6/107 2025-05-07T19:42:57.5272124Z Verifying : git-core-2.47.1-1.amzn2023.0.2.x86_64 7/107 2025-05-07T19:42:57.5272756Z Verifying : git-core-doc-2.47.1-1.amzn2023.0.2.noarch 8/107 2025-05-07T19:42:57.5273697Z Verifying : gnutls-3.8.3-6.amzn2023.0.1.x86_64 9/107 2025-05-07T19:42:57.5274422Z Verifying : groff-base-1.22.4-7.amzn2023.0.2.x86_64 10/107 2025-05-07T19:42:57.5275002Z Verifying : gzip-1.12-1.amzn2023.0.1.x86_64 11/107 2025-05-07T19:42:57.5275617Z Verifying : hwdata-0.384-1.amzn2023.0.3.noarch 12/107 2025-05-07T19:42:57.5276252Z Verifying : jansson-2.14-0.amzn2023.x86_64 13/107 2025-05-07T19:42:57.5276881Z Verifying : kmod-libs-29-2.amzn2023.0.5.x86_64 14/107 2025-05-07T19:42:57.5277486Z Verifying : less-608-2.amzn2023.0.2.x86_64 15/107 2025-05-07T19:42:57.5278125Z Verifying : libcbor-0.7.0-3.amzn2023.0.2.x86_64 16/107 2025-05-07T19:42:57.5278755Z Verifying : libdb-5.3.28-49.amzn2023.0.2.x86_64 17/107 2025-05-07T19:42:57.5279327Z Verifying : libeconf-0.4.0-1.amzn2023.0.3.x86_64 18/107 2025-05-07T19:42:57.5280004Z Verifying : libedit-3.1-38.20210714cvs.amzn2023.0.2.x86_64 19/107 2025-05-07T19:42:57.5280653Z Verifying : libfdisk-2.37.4-1.amzn2023.0.4.x86_64 20/107 2025-05-07T19:42:57.5281236Z Verifying : libfido2-1.10.0-2.amzn2023.0.2.x86_64 21/107 2025-05-07T19:42:57.5281897Z Verifying : libmetalink-0.1.3-14.amzn2023.0.2.x86_64 22/107 2025-05-07T19:42:57.5282506Z Verifying : libpwquality-1.4.4-6.amzn2023.0.2.x86_64 23/107 2025-05-07T19:42:57.5283177Z Verifying : libsemanage-3.4-5.amzn2023.0.2.x86_64 24/107 2025-05-07T19:42:57.5283875Z Verifying : libutempter-1.2.1-4.amzn2023.0.2.x86_64 25/107 2025-05-07T19:42:57.5284478Z Verifying : nano-8.3-1.amzn2023.x86_64 26/107 2025-05-07T19:42:57.5285099Z Verifying : nano-default-editor-8.3-1.amzn2023.noarch 27/107 2025-05-07T19:42:57.5285785Z Verifying : ncurses-6.2-4.20200222.amzn2023.0.6.x86_64 28/107 2025-05-07T19:42:57.5286457Z Verifying : nettle-3.10.1-1.amzn2023.0.1.x86_64 29/107 2025-05-07T19:42:57.5287063Z Verifying : openldap-2.4.57-6.amzn2023.0.7.x86_64 30/107 2025-05-07T19:42:57.5287711Z Verifying : openssh-8.7p1-8.amzn2023.0.14.x86_64 31/107 2025-05-07T19:42:57.5288370Z Verifying : openssh-clients-8.7p1-8.amzn2023.0.14.x86_64 32/107 2025-05-07T19:42:57.5288962Z Verifying : pam-1.5.1-8.amzn2023.0.4.x86_64 33/107 2025-05-07T19:42:57.5289557Z Verifying : pciutils-3.7.0-3.amzn2023.0.2.x86_64 34/107 2025-05-07T19:42:57.5290264Z Verifying : pciutils-libs-3.7.0-3.amzn2023.0.2.x86_64 35/107 2025-05-07T19:42:57.5290863Z Verifying : perl-AutoLoader-5.74-477.amzn2023.0.6.noarch 36/107 2025-05-07T19:42:57.5291420Z Verifying : perl-B-1.80-477.amzn2023.0.6.x86_64 37/107 2025-05-07T19:42:57.5291992Z Verifying : perl-Carp-1.50-458.amzn2023.0.2.noarch 38/107 2025-05-07T19:42:57.5292559Z Verifying : perl-Class-Struct-0.66-477.amzn2023.0.6.noarch 39/107 2025-05-07T19:42:57.5293170Z Verifying : perl-Data-Dumper-2.174-460.amzn2023.0.2.x86_64 40/107 2025-05-07T19:42:57.5293761Z Verifying : perl-Digest-1.20-1.amzn2023.0.2.noarch 41/107 2025-05-07T19:42:57.5294313Z Verifying : perl-Digest-MD5-2.58-2.amzn2023.0.2.x86_64 42/107 2025-05-07T19:42:57.5294908Z Verifying : perl-DynaLoader-1.47-477.amzn2023.0.6.x86_64 43/107 2025-05-07T19:42:57.5295466Z Verifying : perl-Encode-4:3.15-462.amzn2023.0.2.x86_64 44/107 2025-05-07T19:42:57.5296050Z Verifying : perl-Errno-1.30-477.amzn2023.0.6.x86_64 45/107 2025-05-07T19:42:57.5296671Z Verifying : perl-Error-1:0.17029-5.amzn2023.0.2.noarch 46/107 2025-05-07T19:42:57.5297266Z Verifying : perl-Exporter-5.74-459.amzn2023.0.2.noarch 47/107 2025-05-07T19:42:57.5297847Z Verifying : perl-Fcntl-1.13-477.amzn2023.0.6.x86_64 48/107 2025-05-07T19:42:57.5298411Z Verifying : perl-File-Basename-2.85-477.amzn2023.0.6.noarch 49/107 2025-05-07T19:42:57.5299017Z Verifying : perl-File-Find-1.37-477.amzn2023.0.6.noarch 50/107 2025-05-07T19:42:57.5299582Z Verifying : perl-File-Path-2.18-2.amzn2023.0.2.noarch 51/107 2025-05-07T19:42:57.5300174Z Verifying : perl-File-Temp-1:0.231.100-2.amzn2023.0.2.noarch 52/107 2025-05-07T19:42:57.5300772Z Verifying : perl-File-stat-1.09-477.amzn2023.0.6.noarch 53/107 2025-05-07T19:42:57.5301340Z Verifying : perl-FileHandle-2.03-477.amzn2023.0.6.noarch 54/107 2025-05-07T19:42:57.5301952Z Verifying : perl-Getopt-Long-1:2.52-2.amzn2023.0.2.noarch 55/107 2025-05-07T19:42:57.5302527Z Verifying : perl-Getopt-Std-1.12-477.amzn2023.0.6.noarch 56/107 2025-05-07T19:42:57.5303122Z Verifying : perl-Git-2.47.1-1.amzn2023.0.2.noarch 57/107 2025-05-07T19:42:57.5303771Z Verifying : perl-HTTP-Tiny-0.078-1.amzn2023.0.3.noarch 58/107 2025-05-07T19:42:57.5304394Z Verifying : perl-IO-1.43-477.amzn2023.0.6.x86_64 59/107 2025-05-07T19:42:57.5304963Z Verifying : perl-IO-Socket-IP-0.41-3.amzn2023.0.2.noarch 60/107 2025-05-07T19:42:57.5305526Z Verifying : perl-IO-Socket-SSL-2.075-1.amzn2023.0.2.noarch 61/107 2025-05-07T19:42:57.5306137Z Verifying : perl-IPC-Open3-1.21-477.amzn2023.0.6.noarch 62/107 2025-05-07T19:42:57.5306727Z Verifying : perl-MIME-Base64-3.16-2.amzn2023.0.2.x86_64 63/107 2025-05-07T19:42:57.5307280Z Verifying : perl-Mozilla-CA-20200520-4.amzn2023.0.2.noarch 64/107 2025-05-07T19:42:57.5307856Z Verifying : perl-NDBM_File-1.15-477.amzn2023.0.6.x86_64 65/107 2025-05-07T19:42:57.5308394Z Verifying : perl-Net-SSLeay-1.94-1.amzn2023.0.1.x86_64 66/107 2025-05-07T19:42:57.5309161Z Verifying : perl-POSIX-1.94-477.amzn2023.0.6.x86_64 67/107 2025-05-07T19:42:57.5309710Z Verifying : perl-PathTools-3.78-459.amzn2023.0.2.x86_64 68/107 2025-05-07T19:42:57.5310289Z Verifying : perl-Pod-Escapes-1:1.07-458.amzn2023.0.2.noarch 69/107 2025-05-07T19:42:57.5310874Z Verifying : perl-Pod-Perldoc-3.28.01-459.amzn2023.0.3.noarch 70/107 2025-05-07T19:42:57.5311431Z Verifying : perl-Pod-Simple-1:3.42-2.amzn2023.0.2.noarch 71/107 2025-05-07T19:42:57.5312154Z Verifying : perl-Pod-Usage-4:2.01-2.amzn2023.0.2.noarch 72/107 2025-05-07T19:42:57.5312703Z Verifying : perl-Scalar-List-Utils-4:1.56-459.amzn2023.0.2.x 73/107 2025-05-07T19:42:57.5313306Z Verifying : perl-SelectSaver-1.02-477.amzn2023.0.6.noarch 74/107 2025-05-07T19:42:57.5313862Z Verifying : perl-Socket-4:2.032-1.amzn2023.0.2.x86_64 75/107 2025-05-07T19:42:57.5314426Z Verifying : perl-Storable-1:3.21-458.amzn2023.0.2.x86_64 76/107 2025-05-07T19:42:57.5314998Z Verifying : perl-Symbol-1.08-477.amzn2023.0.6.noarch 77/107 2025-05-07T19:42:57.5315568Z Verifying : perl-Term-ANSIColor-5.01-459.amzn2023.0.2.noarch 78/107 2025-05-07T19:42:57.5316163Z Verifying : perl-Term-Cap-1.17-458.amzn2023.0.2.noarch 79/107 2025-05-07T19:42:57.5316725Z Verifying : perl-TermReadKey-2.38-9.amzn2023.0.2.x86_64 80/107 2025-05-07T19:42:57.5317329Z Verifying : perl-Text-ParseWords-3.30-458.amzn2023.0.2.noarc 81/107 2025-05-07T19:42:57.5317930Z Verifying : perl-Text-Tabs+Wrap-2021.0726-1.amzn2023.0.1.noa 82/107 2025-05-07T19:42:57.5318483Z Verifying : perl-Time-Local-2:1.300-5.amzn2023.0.2.noarch 83/107 2025-05-07T19:42:57.5319181Z Verifying : perl-URI-5.09-1.amzn2023.0.2.noarch 84/107 2025-05-07T19:42:57.5319726Z Verifying : perl-base-2.27-477.amzn2023.0.6.noarch 85/107 2025-05-07T19:42:57.5320316Z Verifying : perl-constant-1.33-459.amzn2023.0.2.noarch 86/107 2025-05-07T19:42:57.5320868Z Verifying : perl-if-0.60.800-477.amzn2023.0.6.noarch 87/107 2025-05-07T19:42:57.5321450Z Verifying : perl-interpreter-4:5.32.1-477.amzn2023.0.6.x86_6 88/107 2025-05-07T19:42:57.5322022Z Verifying : perl-lib-0.65-477.amzn2023.0.6.x86_64 89/107 2025-05-07T19:42:57.5322554Z Verifying : perl-libnet-3.13-2.amzn2023.0.2.noarch 90/107 2025-05-07T19:42:57.5323118Z Verifying : perl-libs-4:5.32.1-477.amzn2023.0.6.x86_64 91/107 2025-05-07T19:42:57.5323639Z Verifying : perl-mro-1.23-477.amzn2023.0.6.x86_64 92/107 2025-05-07T19:42:57.5324210Z Verifying : perl-overload-1.31-477.amzn2023.0.6.noarch 93/107 2025-05-07T19:42:57.5324772Z Verifying : perl-overloading-0.02-477.amzn2023.0.6.noarch 94/107 2025-05-07T19:42:57.5325353Z Verifying : perl-parent-1:0.238-458.amzn2023.0.2.noarch 95/107 2025-05-07T19:42:57.5325919Z Verifying : perl-podlators-1:4.14-458.amzn2023.0.2.noarch 96/107 2025-05-07T19:42:57.5326462Z Verifying : perl-subs-1.03-477.amzn2023.0.6.noarch 97/107 2025-05-07T19:42:57.5327021Z Verifying : perl-vars-1.05-477.amzn2023.0.6.noarch 98/107 2025-05-07T19:42:57.5327555Z Verifying : shadow-utils-2:4.9-12.amzn2023.0.4.x86_64 99/107 2025-05-07T19:42:57.5328109Z Verifying : sudo-1.9.15-1.p5.amzn2023.0.1.x86_64 100/107 2025-05-07T19:42:57.5328683Z Verifying : sudo-python-plugin-1.9.15-1.p5.amzn2023.0.1.x86_ 101/107 2025-05-07T19:42:57.5329244Z Verifying : systemd-libs-252.23-3.amzn2023.x86_64 102/107 2025-05-07T19:42:57.5329797Z Verifying : tar-2:1.34-1.amzn2023.0.4.x86_64 103/107 2025-05-07T19:42:57.5330315Z Verifying : util-linux-2.37.4-1.amzn2023.0.4.x86_64 104/107 2025-05-07T19:42:57.5330883Z Verifying : util-linux-core-2.37.4-1.amzn2023.0.4.x86_64 105/107 2025-05-07T19:42:57.5331413Z Verifying : wget-1.21.3-1.amzn2023.0.4.x86_64 106/107 2025-05-07T19:42:57.6305207Z Verifying : which-2.21-26.amzn2023.0.2.x86_64 107/107 2025-05-07T19:42:57.6305561Z 2025-05-07T19:42:57.6305691Z Installed: 2025-05-07T19:42:57.6306057Z binutils-2.41-50.amzn2023.0.3.x86_64 2025-05-07T19:42:57.6306840Z cracklib-2.9.6-27.amzn2023.0.2.x86_64 2025-05-07T19:42:57.6307474Z cyrus-sasl-lib-2.1.27-18.amzn2023.0.3.x86_64 2025-05-07T19:42:57.6308083Z elfutils-debuginfod-client-0.188-3.amzn2023.0.2.x86_64 2025-05-07T19:42:57.6308865Z findutils-1:4.8.0-2.amzn2023.0.2.x86_64 2025-05-07T19:42:57.6309382Z git-2.47.1-1.amzn2023.0.2.x86_64 2025-05-07T19:42:57.6309922Z git-core-2.47.1-1.amzn2023.0.2.x86_64 2025-05-07T19:42:57.6310486Z git-core-doc-2.47.1-1.amzn2023.0.2.noarch 2025-05-07T19:42:57.6311029Z gnutls-3.8.3-6.amzn2023.0.1.x86_64 2025-05-07T19:42:57.6311591Z groff-base-1.22.4-7.amzn2023.0.2.x86_64 2025-05-07T19:42:57.6312126Z gzip-1.12-1.amzn2023.0.1.x86_64 2025-05-07T19:42:57.6312684Z hwdata-0.384-1.amzn2023.0.3.noarch 2025-05-07T19:42:57.6313355Z jansson-2.14-0.amzn2023.x86_64 2025-05-07T19:42:57.6313886Z kmod-libs-29-2.amzn2023.0.5.x86_64 2025-05-07T19:42:57.6314432Z less-608-2.amzn2023.0.2.x86_64 2025-05-07T19:42:57.6314947Z libcbor-0.7.0-3.amzn2023.0.2.x86_64 2025-05-07T19:42:57.6315459Z libdb-5.3.28-49.amzn2023.0.2.x86_64 2025-05-07T19:42:57.6315960Z libeconf-0.4.0-1.amzn2023.0.3.x86_64 2025-05-07T19:42:57.6316504Z libedit-3.1-38.20210714cvs.amzn2023.0.2.x86_64 2025-05-07T19:42:57.6317057Z libfdisk-2.37.4-1.amzn2023.0.4.x86_64 2025-05-07T19:42:57.6317633Z libfido2-1.10.0-2.amzn2023.0.2.x86_64 2025-05-07T19:42:57.6318161Z libmetalink-0.1.3-14.amzn2023.0.2.x86_64 2025-05-07T19:42:57.6318720Z libpwquality-1.4.4-6.amzn2023.0.2.x86_64 2025-05-07T19:42:57.6319257Z libsemanage-3.4-5.amzn2023.0.2.x86_64 2025-05-07T19:42:57.6319809Z libutempter-1.2.1-4.amzn2023.0.2.x86_64 2025-05-07T19:42:57.6320333Z nano-8.3-1.amzn2023.x86_64 2025-05-07T19:42:57.6320955Z nano-default-editor-8.3-1.amzn2023.noarch 2025-05-07T19:42:57.6321617Z ncurses-6.2-4.20200222.amzn2023.0.6.x86_64 2025-05-07T19:42:57.6322097Z nettle-3.10.1-1.amzn2023.0.1.x86_64 2025-05-07T19:42:57.6322593Z openldap-2.4.57-6.amzn2023.0.7.x86_64 2025-05-07T19:42:57.6323095Z openssh-8.7p1-8.amzn2023.0.14.x86_64 2025-05-07T19:42:57.6323602Z openssh-clients-8.7p1-8.amzn2023.0.14.x86_64 2025-05-07T19:42:57.6324104Z pam-1.5.1-8.amzn2023.0.4.x86_64 2025-05-07T19:42:57.6324566Z pciutils-3.7.0-3.amzn2023.0.2.x86_64 2025-05-07T19:42:57.6325076Z pciutils-libs-3.7.0-3.amzn2023.0.2.x86_64 2025-05-07T19:42:57.6325595Z perl-AutoLoader-5.74-477.amzn2023.0.6.noarch 2025-05-07T19:42:57.6326113Z perl-B-1.80-477.amzn2023.0.6.x86_64 2025-05-07T19:42:57.6326616Z perl-Carp-1.50-458.amzn2023.0.2.noarch 2025-05-07T19:42:57.6327220Z perl-Class-Struct-0.66-477.amzn2023.0.6.noarch 2025-05-07T19:42:57.6327768Z perl-Data-Dumper-2.174-460.amzn2023.0.2.x86_64 2025-05-07T19:42:57.6328281Z perl-Digest-1.20-1.amzn2023.0.2.noarch 2025-05-07T19:42:57.6328818Z perl-Digest-MD5-2.58-2.amzn2023.0.2.x86_64 2025-05-07T19:42:57.6329357Z perl-DynaLoader-1.47-477.amzn2023.0.6.x86_64 2025-05-07T19:42:57.6329916Z perl-Encode-4:3.15-462.amzn2023.0.2.x86_64 2025-05-07T19:42:57.6330463Z perl-Errno-1.30-477.amzn2023.0.6.x86_64 2025-05-07T19:42:57.6330985Z perl-Error-1:0.17029-5.amzn2023.0.2.noarch 2025-05-07T19:42:57.6331550Z perl-Exporter-5.74-459.amzn2023.0.2.noarch 2025-05-07T19:42:57.6332078Z perl-Fcntl-1.13-477.amzn2023.0.6.x86_64 2025-05-07T19:42:57.6332647Z perl-File-Basename-2.85-477.amzn2023.0.6.noarch 2025-05-07T19:42:57.6333216Z perl-File-Find-1.37-477.amzn2023.0.6.noarch 2025-05-07T19:42:57.6333808Z perl-File-Path-2.18-2.amzn2023.0.2.noarch 2025-05-07T19:42:57.6334360Z perl-File-Temp-1:0.231.100-2.amzn2023.0.2.noarch 2025-05-07T19:42:57.6334865Z perl-File-stat-1.09-477.amzn2023.0.6.noarch 2025-05-07T19:42:57.6335409Z perl-FileHandle-2.03-477.amzn2023.0.6.noarch 2025-05-07T19:42:57.6335945Z perl-Getopt-Long-1:2.52-2.amzn2023.0.2.noarch 2025-05-07T19:42:57.6336469Z perl-Getopt-Std-1.12-477.amzn2023.0.6.noarch 2025-05-07T19:42:57.6336983Z perl-Git-2.47.1-1.amzn2023.0.2.noarch 2025-05-07T19:42:57.6337480Z perl-HTTP-Tiny-0.078-1.amzn2023.0.3.noarch 2025-05-07T19:42:57.6337983Z perl-IO-1.43-477.amzn2023.0.6.x86_64 2025-05-07T19:42:57.6338478Z perl-IO-Socket-IP-0.41-3.amzn2023.0.2.noarch 2025-05-07T19:42:57.6339014Z perl-IO-Socket-SSL-2.075-1.amzn2023.0.2.noarch 2025-05-07T19:42:57.6339551Z perl-IPC-Open3-1.21-477.amzn2023.0.6.noarch 2025-05-07T19:42:57.6340064Z perl-MIME-Base64-3.16-2.amzn2023.0.2.x86_64 2025-05-07T19:42:57.6340603Z perl-Mozilla-CA-20200520-4.amzn2023.0.2.noarch 2025-05-07T19:42:57.6341117Z perl-NDBM_File-1.15-477.amzn2023.0.6.x86_64 2025-05-07T19:42:57.6341626Z perl-Net-SSLeay-1.94-1.amzn2023.0.1.x86_64 2025-05-07T19:42:57.6342127Z perl-POSIX-1.94-477.amzn2023.0.6.x86_64 2025-05-07T19:42:57.6342667Z perl-PathTools-3.78-459.amzn2023.0.2.x86_64 2025-05-07T19:42:57.6343293Z perl-Pod-Escapes-1:1.07-458.amzn2023.0.2.noarch 2025-05-07T19:42:57.6344023Z perl-Pod-Perldoc-3.28.01-459.amzn2023.0.3.noarch 2025-05-07T19:42:57.6344675Z perl-Pod-Simple-1:3.42-2.amzn2023.0.2.noarch 2025-05-07T19:42:57.6345238Z perl-Pod-Usage-4:2.01-2.amzn2023.0.2.noarch 2025-05-07T19:42:57.6345842Z perl-Scalar-List-Utils-4:1.56-459.amzn2023.0.2.x86_64 2025-05-07T19:42:57.6346438Z perl-SelectSaver-1.02-477.amzn2023.0.6.noarch 2025-05-07T19:42:57.6347051Z perl-Socket-4:2.032-1.amzn2023.0.2.x86_64 2025-05-07T19:42:57.6347706Z perl-Storable-1:3.21-458.amzn2023.0.2.x86_64 2025-05-07T19:42:57.6348273Z perl-Symbol-1.08-477.amzn2023.0.6.noarch 2025-05-07T19:42:57.6348906Z perl-Term-ANSIColor-5.01-459.amzn2023.0.2.noarch 2025-05-07T19:42:57.6349498Z perl-Term-Cap-1.17-458.amzn2023.0.2.noarch 2025-05-07T19:42:57.6350107Z perl-TermReadKey-2.38-9.amzn2023.0.2.x86_64 2025-05-07T19:42:57.6350734Z perl-Text-ParseWords-3.30-458.amzn2023.0.2.noarch 2025-05-07T19:42:57.6351339Z perl-Text-Tabs+Wrap-2021.0726-1.amzn2023.0.1.noarch 2025-05-07T19:42:57.6351924Z perl-Time-Local-2:1.300-5.amzn2023.0.2.noarch 2025-05-07T19:42:57.6352451Z perl-URI-5.09-1.amzn2023.0.2.noarch 2025-05-07T19:42:57.6352990Z perl-base-2.27-477.amzn2023.0.6.noarch 2025-05-07T19:42:57.6353530Z perl-constant-1.33-459.amzn2023.0.2.noarch 2025-05-07T19:42:57.6354081Z perl-if-0.60.800-477.amzn2023.0.6.noarch 2025-05-07T19:42:57.6354765Z perl-interpreter-4:5.32.1-477.amzn2023.0.6.x86_64 2025-05-07T19:42:57.6355299Z perl-lib-0.65-477.amzn2023.0.6.x86_64 2025-05-07T19:42:57.6355951Z perl-libnet-3.13-2.amzn2023.0.2.noarch 2025-05-07T19:42:57.6356439Z perl-libs-4:5.32.1-477.amzn2023.0.6.x86_64 2025-05-07T19:42:57.6356925Z perl-mro-1.23-477.amzn2023.0.6.x86_64 2025-05-07T19:42:57.6357437Z perl-overload-1.31-477.amzn2023.0.6.noarch 2025-05-07T19:42:57.6357958Z perl-overloading-0.02-477.amzn2023.0.6.noarch 2025-05-07T19:42:57.6358486Z perl-parent-1:0.238-458.amzn2023.0.2.noarch 2025-05-07T19:42:57.6358989Z perl-podlators-1:4.14-458.amzn2023.0.2.noarch 2025-05-07T19:42:57.6359503Z perl-subs-1.03-477.amzn2023.0.6.noarch 2025-05-07T19:42:57.6359993Z perl-vars-1.05-477.amzn2023.0.6.noarch 2025-05-07T19:42:57.6360496Z shadow-utils-2:4.9-12.amzn2023.0.4.x86_64 2025-05-07T19:42:57.6360978Z sudo-1.9.15-1.p5.amzn2023.0.1.x86_64 2025-05-07T19:42:57.6361463Z sudo-python-plugin-1.9.15-1.p5.amzn2023.0.1.x86_64 2025-05-07T19:42:57.6361985Z systemd-libs-252.23-3.amzn2023.x86_64 2025-05-07T19:42:57.6362449Z tar-2:1.34-1.amzn2023.0.4.x86_64 2025-05-07T19:42:57.6362923Z util-linux-2.37.4-1.amzn2023.0.4.x86_64 2025-05-07T19:42:57.6363412Z util-linux-core-2.37.4-1.amzn2023.0.4.x86_64 2025-05-07T19:42:57.6364042Z wget-1.21.3-1.amzn2023.0.4.x86_64 2025-05-07T19:42:57.6364502Z which-2.21-26.amzn2023.0.2.x86_64 2025-05-07T19:42:57.6364784Z 2025-05-07T19:42:57.6364866Z Complete! 2025-05-07T19:42:57.7002458Z ##[group]Run actions/checkout@v4 2025-05-07T19:42:57.7002831Z with: 2025-05-07T19:42:57.7003049Z submodules: true 2025-05-07T19:42:57.7003326Z repository: pytorch/FBGEMM 2025-05-07T19:42:57.7003812Z token: *** 2025-05-07T19:42:57.7004033Z ssh-strict: true 2025-05-07T19:42:57.7004299Z ssh-user: git 2025-05-07T19:42:57.7004532Z persist-credentials: true 2025-05-07T19:42:57.7004819Z clean: true 2025-05-07T19:42:57.7005054Z sparse-checkout-cone-mode: true 2025-05-07T19:42:57.7005568Z fetch-depth: 1 2025-05-07T19:42:57.7005790Z fetch-tags: false 2025-05-07T19:42:57.7006045Z show-progress: true 2025-05-07T19:42:57.7006284Z lfs: false 2025-05-07T19:42:57.7006540Z set-safe-directory: true 2025-05-07T19:42:57.7006801Z env: 2025-05-07T19:42:57.7007050Z PRELUDE: .github/scripts/setup_env.bash 2025-05-07T19:42:57.7007383Z BUILD_ENV: build_binary 2025-05-07T19:42:57.7007639Z BUILD_TARGET: genai 2025-05-07T19:42:57.7007905Z BUILD_VARIANT: cuda 2025-05-07T19:42:57.7008233Z BUILD_CUDA_VERSION: 12.8.0 2025-05-07T19:42:57.7008491Z ##[endgroup] 2025-05-07T19:42:57.7053309Z ##[command]/usr/bin/docker exec 3634184bf123dceeb3f825b9e1bbe7817ab6e81e701296975297ae3260d158ce sh -c "cat /etc/*release | grep ^ID" 2025-05-07T19:42:57.9981303Z Syncing repository: pytorch/FBGEMM 2025-05-07T19:42:57.9982876Z ##[group]Getting Git version info 2025-05-07T19:42:57.9983374Z Working directory is '/__w/FBGEMM/FBGEMM' 2025-05-07T19:42:57.9984153Z [command]/usr/bin/git version 2025-05-07T19:42:57.9984540Z git version 2.47.1 2025-05-07T19:42:57.9985541Z ##[endgroup] 2025-05-07T19:42:57.9989427Z Temporarily overriding HOME='/__w/_temp/8c1d8258-d379-4ff3-91c2-de8f9af2bbf3' before making global git config changes 2025-05-07T19:42:57.9990243Z Adding repository directory to the temporary git global config as a safe directory 2025-05-07T19:42:57.9994502Z [command]/usr/bin/git config --global --add safe.directory /__w/FBGEMM/FBGEMM 2025-05-07T19:42:58.0024506Z [command]/usr/bin/git config --local --get remote.origin.url 2025-05-07T19:42:58.0043777Z https://github.com/pytorch/FBGEMM 2025-05-07T19:42:58.0056866Z ##[group]Removing previously created refs, to avoid conflicts 2025-05-07T19:42:58.0058794Z [command]/usr/bin/git rev-parse --symbolic-full-name --verify --quiet HEAD 2025-05-07T19:42:58.0079933Z HEAD 2025-05-07T19:42:58.0107800Z ##[endgroup] 2025-05-07T19:42:58.0110199Z [command]/usr/bin/git submodule status 2025-05-07T19:42:58.0468321Z e5d7c0bd5d9aec44d68830187138149e6a8c4e32 external/asmjit (e5d7c0b) 2025-05-07T19:42:58.0538665Z 4a61bdd4bd4ed730e078aebc7c0fcf046ff29406 external/composable_kernel (remotes/origin/FBGEMM) 2025-05-07T19:42:58.0647997Z 6543fec09b2f04ac4a666882998b534afc9c1349 external/cpuinfo (6543fec) 2025-05-07T19:42:58.0717475Z 3ed8d2ec4ba35ef5d9d8353826209b6f868f63d3 external/cutlass (remotes/origin/FBGEMM) 2025-05-07T19:42:58.0943513Z f8d7d77c06936315286eb55f8de22cd23c188571 external/googletest (release-1.8.0-3335-gf8d7d77c) 2025-05-07T19:42:58.1017929Z 420084499c7c1e1c2d801922f40df202eac5f3a0 external/hipify_torch (remotes/origin/mmelesse-9-g4200844) 2025-05-07T19:42:58.1059773Z 9cca280a4d0ccf0c08f47a99aa71d1b0e52f8d03 external/json (v3.11.2-84-g9cca280a) 2025-05-07T19:42:58.1065140Z ##[group]Cleaning the repository 2025-05-07T19:42:58.1066797Z [command]/usr/bin/git clean -ffdx 2025-05-07T19:42:58.1120016Z [command]/usr/bin/git reset --hard HEAD 2025-05-07T19:42:58.2235350Z HEAD is now at c33feca Merge 045b315845f03385799c2c6770bf99c5550bf12b into fd4df5f456e0cca514bacd98a39efb72990fd9f4 2025-05-07T19:42:58.2239295Z ##[endgroup] 2025-05-07T19:42:58.2240817Z ##[group]Disabling automatic garbage collection 2025-05-07T19:42:58.2245464Z [command]/usr/bin/git config --local gc.auto 0 2025-05-07T19:42:58.2268641Z ##[endgroup] 2025-05-07T19:42:58.2269033Z ##[group]Setting up auth 2025-05-07T19:42:58.2278013Z [command]/usr/bin/git config --local --name-only --get-regexp core\.sshCommand 2025-05-07T19:42:58.2302326Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'core\.sshCommand' && git config --local --unset-all 'core.sshCommand' || :" 2025-05-07T19:42:58.2580499Z Entering 'external/asmjit' 2025-05-07T19:42:58.2625655Z Entering 'external/composable_kernel' 2025-05-07T19:42:58.2684122Z Entering 'external/cpuinfo' 2025-05-07T19:42:58.2744324Z Entering 'external/cutlass' 2025-05-07T19:42:58.2803129Z Entering 'external/googletest' 2025-05-07T19:42:58.2866883Z Entering 'external/hipify_torch' 2025-05-07T19:42:58.2915334Z Entering 'external/json' 2025-05-07T19:42:58.2975539Z [command]/usr/bin/git config --local --name-only --get-regexp http\.https\:\/\/github\.com\/\.extraheader 2025-05-07T19:42:58.3011410Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'http\.https\:\/\/github\.com\/\.extraheader' && git config --local --unset-all 'http.https://github.com/.extraheader' || :" 2025-05-07T19:42:58.3271027Z Entering 'external/asmjit' 2025-05-07T19:42:58.3331421Z Entering 'external/composable_kernel' 2025-05-07T19:42:58.3396026Z Entering 'external/cpuinfo' 2025-05-07T19:42:58.3448304Z Entering 'external/cutlass' 2025-05-07T19:42:58.3505602Z Entering 'external/googletest' 2025-05-07T19:42:58.3569164Z Entering 'external/hipify_torch' 2025-05-07T19:42:58.3625426Z Entering 'external/json' 2025-05-07T19:42:58.3695146Z [command]/usr/bin/git config --local http.https://github.com/.extraheader AUTHORIZATION: basic *** 2025-05-07T19:42:58.3734394Z ##[endgroup] 2025-05-07T19:42:58.3735429Z ##[group]Fetching the repository 2025-05-07T19:42:58.3737899Z [command]/usr/bin/git -c protocol.version=2 fetch --no-tags --prune --no-recurse-submodules --depth=1 origin +a2f4c52051596e74bc8c16e3d2867a4ecdd271e0:refs/remotes/pull/4066/merge 2025-05-07T19:42:58.5782013Z From https://github.com/pytorch/FBGEMM 2025-05-07T19:42:58.5782697Z + c33feca...a2f4c52 a2f4c52051596e74bc8c16e3d2867a4ecdd271e0 -> pull/4066/merge (forced update) 2025-05-07T19:42:58.5795861Z ##[endgroup] 2025-05-07T19:42:58.5796415Z ##[group]Determining the checkout info 2025-05-07T19:42:58.5797944Z ##[endgroup] 2025-05-07T19:42:58.5801494Z [command]/usr/bin/git sparse-checkout disable 2025-05-07T19:42:58.6306883Z [command]/usr/bin/git config --local --unset-all extensions.worktreeConfig 2025-05-07T19:42:58.6332941Z ##[group]Checking out the ref 2025-05-07T19:42:58.6333444Z [command]/usr/bin/git checkout --progress --force refs/remotes/pull/4066/merge 2025-05-07T19:42:58.7305577Z Warning: you are leaving 1 commit behind, not connected to 2025-05-07T19:42:58.7306003Z any of your branches: 2025-05-07T19:42:58.7306215Z 2025-05-07T19:42:58.7306617Z c33feca Merge 045b315845f03385799c2c6770bf99c5550bf12b into fd4df5f456e0cca514bacd98a39efb72990fd9f4 2025-05-07T19:42:58.7307080Z 2025-05-07T19:42:58.7307302Z If you want to keep it by creating a new branch, this may be a good time 2025-05-07T19:42:58.7307745Z to do so with: 2025-05-07T19:42:58.7307889Z 2025-05-07T19:42:58.7308068Z git branch c33feca 2025-05-07T19:42:58.7308284Z 2025-05-07T19:42:58.7309128Z HEAD is now at a2f4c52 Merge 6060cd4b5f971680caecdcc657faccb5720d1c3e into fd4df5f456e0cca514bacd98a39efb72990fd9f4 2025-05-07T19:42:58.7313087Z ##[endgroup] 2025-05-07T19:42:58.7313550Z ##[group]Setting up auth for fetching submodules 2025-05-07T19:42:58.7316808Z [command]/usr/bin/git config --global http.https://github.com/.extraheader AUTHORIZATION: basic *** 2025-05-07T19:42:58.7356050Z [command]/usr/bin/git config --global --unset-all url.https://github.com/.insteadOf 2025-05-07T19:42:58.7381943Z [command]/usr/bin/git config --global --add url.https://github.com/.insteadOf git@github.com: 2025-05-07T19:42:58.7408242Z [command]/usr/bin/git config --global --add url.https://github.com/.insteadOf org-21003710@github.com: 2025-05-07T19:42:58.7427924Z ##[endgroup] 2025-05-07T19:42:58.7428392Z ##[group]Fetching submodules 2025-05-07T19:42:58.7430347Z [command]/usr/bin/git submodule sync 2025-05-07T19:42:58.7737603Z Synchronizing submodule url for 'external/asmjit' 2025-05-07T19:42:58.7738476Z Synchronizing submodule url for 'external/composable_kernel' 2025-05-07T19:42:58.7738971Z Synchronizing submodule url for 'external/cpuinfo' 2025-05-07T19:42:58.7739428Z Synchronizing submodule url for 'external/cutlass' 2025-05-07T19:42:58.7739893Z Synchronizing submodule url for 'external/googletest' 2025-05-07T19:42:58.7740351Z Synchronizing submodule url for 'external/hipify_torch' 2025-05-07T19:42:58.7740810Z Synchronizing submodule url for 'external/json' 2025-05-07T19:42:58.7744498Z [command]/usr/bin/git -c protocol.version=2 submodule update --init --force --depth=1 2025-05-07T19:42:58.8498506Z Submodule path 'external/asmjit': checked out 'e5d7c0bd5d9aec44d68830187138149e6a8c4e32' 2025-05-07T19:42:59.1272034Z Submodule path 'external/composable_kernel': checked out '4a61bdd4bd4ed730e078aebc7c0fcf046ff29406' 2025-05-07T19:42:59.2298123Z Submodule path 'external/cpuinfo': checked out '6543fec09b2f04ac4a666882998b534afc9c1349' 2025-05-07T19:42:59.9228934Z Submodule path 'external/cutlass': checked out '3ed8d2ec4ba35ef5d9d8353826209b6f868f63d3' 2025-05-07T19:42:59.9667190Z Submodule path 'external/googletest': checked out 'f8d7d77c06936315286eb55f8de22cd23c188571' 2025-05-07T19:42:59.9751097Z Submodule path 'external/hipify_torch': checked out '420084499c7c1e1c2d801922f40df202eac5f3a0' 2025-05-07T19:43:00.0977002Z Submodule path 'external/json': checked out '9cca280a4d0ccf0c08f47a99aa71d1b0e52f8d03' 2025-05-07T19:43:00.0985656Z [command]/usr/bin/git submodule foreach git config --local gc.auto 0 2025-05-07T19:43:00.1266364Z Entering 'external/asmjit' 2025-05-07T19:43:00.1291725Z Entering 'external/composable_kernel' 2025-05-07T19:43:00.1328410Z Entering 'external/cpuinfo' 2025-05-07T19:43:00.1358718Z Entering 'external/cutlass' 2025-05-07T19:43:00.1396029Z Entering 'external/googletest' 2025-05-07T19:43:00.1426202Z Entering 'external/hipify_torch' 2025-05-07T19:43:00.1458771Z Entering 'external/json' 2025-05-07T19:43:00.1499792Z ##[endgroup] 2025-05-07T19:43:00.1500304Z ##[group]Persisting credentials for submodules 2025-05-07T19:43:00.1503175Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'url\.https\:\/\/github\.com\/\.insteadOf' && git config --local --unset-all 'url.https://github.com/.insteadOf' || :" 2025-05-07T19:43:00.1768761Z Entering 'external/asmjit' 2025-05-07T19:43:00.1811103Z url.https://github.com/.insteadof 2025-05-07T19:43:00.1812088Z url.https://github.com/.insteadof 2025-05-07T19:43:00.1854019Z Entering 'external/composable_kernel' 2025-05-07T19:43:00.1887096Z url.https://github.com/.insteadof 2025-05-07T19:43:00.1888110Z url.https://github.com/.insteadof 2025-05-07T19:43:00.1931215Z Entering 'external/cpuinfo' 2025-05-07T19:43:00.1979200Z url.https://github.com/.insteadof 2025-05-07T19:43:00.1980215Z url.https://github.com/.insteadof 2025-05-07T19:43:00.2011682Z Entering 'external/cutlass' 2025-05-07T19:43:00.2046803Z url.https://github.com/.insteadof 2025-05-07T19:43:00.2047219Z url.https://github.com/.insteadof 2025-05-07T19:43:00.2091122Z Entering 'external/googletest' 2025-05-07T19:43:00.2136398Z url.https://github.com/.insteadof 2025-05-07T19:43:00.2137206Z url.https://github.com/.insteadof 2025-05-07T19:43:00.2170816Z Entering 'external/hipify_torch' 2025-05-07T19:43:00.2203983Z url.https://github.com/.insteadof 2025-05-07T19:43:00.2204693Z url.https://github.com/.insteadof 2025-05-07T19:43:00.2229785Z Entering 'external/json' 2025-05-07T19:43:00.2275969Z url.https://github.com/.insteadof 2025-05-07T19:43:00.2277025Z url.https://github.com/.insteadof 2025-05-07T19:43:00.2336984Z [command]/usr/bin/git submodule foreach sh -c "git config --local 'http.https://github.com/.extraheader' 'AUTHORIZATION: basic ***' && git config --local --show-origin --name-only --get-regexp remote.origin.url" 2025-05-07T19:43:00.2664326Z Entering 'external/asmjit' 2025-05-07T19:43:00.2706835Z file:/__w/FBGEMM/FBGEMM/.git/modules/external/asmjit/config remote.origin.url 2025-05-07T19:43:00.2707385Z Entering 'external/composable_kernel' 2025-05-07T19:43:00.2760980Z file:/__w/FBGEMM/FBGEMM/.git/modules/external/composable_kernel/config remote.origin.url 2025-05-07T19:43:00.2762598Z Entering 'external/cpuinfo' 2025-05-07T19:43:00.2810211Z file:/__w/FBGEMM/FBGEMM/.git/modules/external/cpuinfo/config remote.origin.url 2025-05-07T19:43:00.2811617Z Entering 'external/cutlass' 2025-05-07T19:43:00.2858711Z file:/__w/FBGEMM/FBGEMM/.git/modules/external/cutlass/config remote.origin.url 2025-05-07T19:43:00.2860151Z Entering 'external/googletest' 2025-05-07T19:43:00.2905851Z file:/__w/FBGEMM/FBGEMM/.git/modules/external/googletest/config remote.origin.url 2025-05-07T19:43:00.2906420Z Entering 'external/hipify_torch' 2025-05-07T19:43:00.2957644Z file:/__w/FBGEMM/FBGEMM/.git/modules/external/hipify_torch/config remote.origin.url 2025-05-07T19:43:00.2959160Z Entering 'external/json' 2025-05-07T19:43:00.3006248Z file:/__w/FBGEMM/FBGEMM/.git/modules/external/json/config remote.origin.url 2025-05-07T19:43:00.3147556Z [command]/usr/bin/git submodule foreach git config --local --add 'url.https://github.com/.insteadOf' 'git@github.com:' 2025-05-07T19:43:00.3432078Z Entering 'external/asmjit' 2025-05-07T19:43:00.3458766Z Entering 'external/composable_kernel' 2025-05-07T19:43:00.3490969Z Entering 'external/cpuinfo' 2025-05-07T19:43:00.3514367Z Entering 'external/cutlass' 2025-05-07T19:43:00.3542084Z Entering 'external/googletest' 2025-05-07T19:43:00.3574306Z Entering 'external/hipify_torch' 2025-05-07T19:43:00.3602874Z Entering 'external/json' 2025-05-07T19:43:00.3649634Z [command]/usr/bin/git submodule foreach git config --local --add 'url.https://github.com/.insteadOf' 'org-21003710@github.com:' 2025-05-07T19:43:00.3936835Z Entering 'external/asmjit' 2025-05-07T19:43:00.3956686Z Entering 'external/composable_kernel' 2025-05-07T19:43:00.3993270Z Entering 'external/cpuinfo' 2025-05-07T19:43:00.4023143Z Entering 'external/cutlass' 2025-05-07T19:43:00.4049859Z Entering 'external/googletest' 2025-05-07T19:43:00.4070844Z Entering 'external/hipify_torch' 2025-05-07T19:43:00.4097937Z Entering 'external/json' 2025-05-07T19:43:00.4151660Z ##[endgroup] 2025-05-07T19:43:00.4180300Z [command]/usr/bin/git log -1 --format=%H 2025-05-07T19:43:00.4202762Z a2f4c52051596e74bc8c16e3d2867a4ecdd271e0 2025-05-07T19:43:00.4369090Z ##[group]Run . $PRELUDE; print_system_info 2025-05-07T19:43:00.4369472Z . $PRELUDE; print_system_info 2025-05-07T19:43:00.4369928Z shell: bash --noprofile --norc -e -o pipefail {0} 2025-05-07T19:43:00.4370240Z env: 2025-05-07T19:43:00.4370448Z PRELUDE: .github/scripts/setup_env.bash 2025-05-07T19:43:00.4370771Z BUILD_ENV: build_binary 2025-05-07T19:43:00.4371004Z BUILD_TARGET: genai 2025-05-07T19:43:00.4371237Z BUILD_VARIANT: cuda 2025-05-07T19:43:00.4371461Z BUILD_CUDA_VERSION: 12.8.0 2025-05-07T19:43:00.4371707Z ##[endgroup] 2025-05-07T19:43:00.8464056Z ################################################################################ 2025-05-07T19:43:00.8465175Z # Print System Info 2025-05-07T19:43:00.8465828Z # 2025-05-07T19:43:00.8479879Z # [2025-05-07T19:43:00.847Z] + print_system_info 2025-05-07T19:43:00.8480975Z ################################################################################ 2025-05-07T19:43:00.8481659Z 2025-05-07T19:43:00.8482116Z ################################################################################ 2025-05-07T19:43:00.8483103Z [INFO] Printing environment variables ... 2025-05-07T19:43:00.8483989Z + printenv 2025-05-07T19:43:00.8484324Z 2025-05-07T19:43:00.8489125Z GITHUB_WORKSPACE=/__w/FBGEMM/FBGEMM 2025-05-07T19:43:00.8490018Z BUILD_VARIANT=cuda 2025-05-07T19:43:00.8490736Z HOSTNAME=3634184bf123 2025-05-07T19:43:00.8491934Z GITHUB_PATH=/__w/_temp/_runner_file_commands/add_path_fbfcfe03-b314-4c30-aeac-f2076cc4937d 2025-05-07T19:43:00.8493335Z GITHUB_ACTION=__run_2 2025-05-07T19:43:00.8493992Z GITHUB_RUN_NUMBER=10601 2025-05-07T19:43:00.8494707Z RUNNER_NAME=i-08cf476e34faaf25f 2025-05-07T19:43:00.8495527Z GITHUB_REPOSITORY_OWNER_ID=21003710 2025-05-07T19:43:00.8496385Z PLATFORM_NAME_LC=linux-x86_64 2025-05-07T19:43:00.8497252Z MACHINE_NAME_LC=x86_64 2025-05-07T19:43:00.8497495Z GITHUB_TRIGGERING_ACTOR=q10 2025-05-07T19:43:00.8497799Z PRELUDE=.github/scripts/setup_env.bash 2025-05-07T19:43:00.8498099Z GITHUB_REF_TYPE=branch 2025-05-07T19:43:00.8498608Z *** 2025-05-07T19:43:00.8498817Z GITHUB_REPOSITORY_ID=150154628 2025-05-07T19:43:00.8499105Z GITHUB_ACTIONS=true 2025-05-07T19:43:00.8499390Z GITHUB_SHA=a2f4c52051596e74bc8c16e3d2867a4ecdd271e0 2025-05-07T19:43:00.8500059Z GITHUB_WORKFLOW_REF=pytorch/FBGEMM/.github/workflows/fbgemm_gpu_ci_cuda.yml@refs/pull/4066/merge 2025-05-07T19:43:00.8500940Z RUNNER_ENVIRONMENT=self-hosted 2025-05-07T19:43:00.8501217Z GITHUB_REF=refs/pull/4066/merge 2025-05-07T19:43:00.8501599Z RUNNER_OS=Linux 2025-05-07T19:43:00.8501812Z GITHUB_REF_PROTECTED=false 2025-05-07T19:43:00.8502063Z HOME=/github/home 2025-05-07T19:43:00.8502302Z GITHUB_API_URL=https://api.github.com 2025-05-07T19:43:00.8502594Z RUNNER_ARCH=X64 2025-05-07T19:43:00.8502803Z RUNNER_TEMP=/__w/_temp 2025-05-07T19:43:00.8503045Z BUILD_TARGET=genai 2025-05-07T19:43:00.8503585Z GITHUB_STATE=/__w/_temp/_runner_file_commands/save_state_fbfcfe03-b314-4c30-aeac-f2076cc4937d 2025-05-07T19:43:00.8504438Z GITHUB_ENV=/__w/_temp/_runner_file_commands/set_env_fbfcfe03-b314-4c30-aeac-f2076cc4937d 2025-05-07T19:43:00.8504954Z GITHUB_EVENT_PATH=/github/workflow/event.json 2025-05-07T19:43:00.8505289Z GITHUB_EVENT_NAME=pull_request 2025-05-07T19:43:00.8505584Z GITHUB_RUN_ID=14891846252 2025-05-07T19:43:00.8506059Z GITHUB_STEP_SUMMARY=/__w/_temp/_runner_file_commands/step_summary_fbfcfe03-b314-4c30-aeac-f2076cc4937d 2025-05-07T19:43:00.8506604Z BUILD_ENV=build_binary 2025-05-07T19:43:00.8506843Z GITHUB_ACTOR=q10 2025-05-07T19:43:00.8507120Z GITHUB_RUN_ATTEMPT=1 2025-05-07T19:43:00.8507372Z KERN_NAME_LC=linux 2025-05-07T19:43:00.8507602Z BUILD_CUDA_VERSION=12.8.0 2025-05-07T19:43:00.8507925Z GITHUB_GRAPHQL_URL=https://api.github.com/graphql 2025-05-07T19:43:00.8508285Z PLATFORM_NAME=Linux-x86_64 2025-05-07T19:43:00.8508601Z GITHUB_SERVER_URL=https://github.com 2025-05-07T19:43:00.8509129Z SHLVL=1 2025-05-07T19:43:00.8509360Z GITHUB_ACTOR_ID=255046 2025-05-07T19:43:00.8509630Z RUNNER_TOOL_CACHE=/__w/_tool 2025-05-07T19:43:00.8510189Z GITHUB_WORKFLOW_SHA=6060cd4b5f971680caecdcc657faccb5720d1c3e 2025-05-07T19:43:00.8510599Z GITHUB_REF_NAME=4066/merge 2025-05-07T19:43:00.8510852Z KERN_NAME=Linux 2025-05-07T19:43:00.8511101Z GITHUB_JOB=build_artifact 2025-05-07T19:43:00.8511368Z GITHUB_REPOSITORY=pytorch/FBGEMM 2025-05-07T19:43:00.8511671Z GITHUB_RETENTION_DAYS=90 2025-05-07T19:43:00.8511941Z RUNNER_WORKSPACE=/__w/FBGEMM 2025-05-07T19:43:00.8512225Z GITHUB_ACTION_REPOSITORY= 2025-05-07T19:43:00.8512576Z PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin 2025-05-07T19:43:00.8512980Z GITHUB_BASE_REF=main 2025-05-07T19:43:00.8513314Z CI=true 2025-05-07T19:43:00.8513528Z GITHUB_REPOSITORY_OWNER=pytorch 2025-05-07T19:43:00.8513836Z GITHUB_HEAD_REF=bm/genai-rocm-oss-6 2025-05-07T19:43:00.8514124Z GITHUB_ACTION_REF= 2025-05-07T19:43:00.8514519Z GITHUB_WORKFLOW=FBGEMM GPU/GenAI CUDA CI 2025-05-07T19:43:00.8515116Z GITHUB_OUTPUT=/__w/_temp/_runner_file_commands/set_output_fbfcfe03-b314-4c30-aeac-f2076cc4937d 2025-05-07T19:43:00.8515587Z MACHINE_NAME=x86_64 2025-05-07T19:43:00.8515803Z _=/usr/bin/printenv 2025-05-07T19:43:00.8515948Z 2025-05-07T19:43:00.8516066Z ################################################################################ 2025-05-07T19:43:00.8516392Z [INFO] Print ldd version ... 2025-05-07T19:43:00.8516638Z + ldd --version 2025-05-07T19:43:00.8516767Z 2025-05-07T19:43:00.8516868Z ldd (GNU libc) 2.34 2025-05-07T19:43:00.8517122Z Copyright (C) 2021 Free Software Foundation, Inc. 2025-05-07T19:43:00.8517564Z This is free software; see the source for copying conditions. There is NO 2025-05-07T19:43:00.8518082Z warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. 2025-05-07T19:43:00.8518532Z Written by Roland McGrath and Ulrich Drepper. 2025-05-07T19:43:00.8518766Z 2025-05-07T19:43:00.8518896Z ################################################################################ 2025-05-07T19:43:00.8519196Z [INFO] Print CPU info ... 2025-05-07T19:43:00.8519712Z + nproc 2025-05-07T19:43:00.8519820Z 2025-05-07T19:43:00.8519913Z 96 2025-05-07T19:43:00.8520693Z 2025-05-07T19:43:00.8521085Z + lscpu 2025-05-07T19:43:00.8521192Z 2025-05-07T19:43:00.8782398Z Architecture: x86_64 2025-05-07T19:43:00.8783041Z CPU op-mode(s): 32-bit, 64-bit 2025-05-07T19:43:00.8783619Z Address sizes: 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.8784326Z Byte Order: Little Endian 2025-05-07T19:43:00.8784682Z CPU(s): 96 2025-05-07T19:43:00.8785113Z On-line CPU(s) list: 0-95 2025-05-07T19:43:00.8785448Z Vendor ID: GenuineIntel 2025-05-07T19:43:00.8785904Z Model name: Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.8786298Z CPU family: 6 2025-05-07T19:43:00.8786610Z Model: 85 2025-05-07T19:43:00.8786925Z Thread(s) per core: 2 2025-05-07T19:43:00.8787231Z Core(s) per socket: 24 2025-05-07T19:43:00.8787661Z Socket(s): 2 2025-05-07T19:43:00.8787930Z Stepping: 7 2025-05-07T19:43:00.8788238Z BogoMIPS: 5999.99 2025-05-07T19:43:00.8790430Z Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.8792643Z Hypervisor vendor: KVM 2025-05-07T19:43:00.8798564Z Virtualization type: full 2025-05-07T19:43:00.8799230Z L1d cache: 1.5 MiB (48 instances) 2025-05-07T19:43:00.8799631Z L1i cache: 1.5 MiB (48 instances) 2025-05-07T19:43:00.8800025Z L2 cache: 48 MiB (48 instances) 2025-05-07T19:43:00.8800407Z L3 cache: 71.5 MiB (2 instances) 2025-05-07T19:43:00.8800762Z NUMA node(s): 2 2025-05-07T19:43:00.8801080Z NUMA node0 CPU(s): 0-23,48-71 2025-05-07T19:43:00.8801449Z NUMA node1 CPU(s): 24-47,72-95 2025-05-07T19:43:00.8801944Z Vulnerability Gather data sampling: Unknown: Dependent on hypervisor status 2025-05-07T19:43:00.8802527Z Vulnerability Itlb multihit: KVM: Mitigation: VMX unsupported 2025-05-07T19:43:00.8803044Z Vulnerability L1tf: Mitigation; PTE Inversion 2025-05-07T19:43:00.8803661Z Vulnerability Mds: Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown 2025-05-07T19:43:00.8804264Z Vulnerability Meltdown: Mitigation; PTI 2025-05-07T19:43:00.8804885Z Vulnerability Mmio stale data: Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown 2025-05-07T19:43:00.8805545Z Vulnerability Reg file data sampling: Not affected 2025-05-07T19:43:00.8805962Z Vulnerability Retbleed: Vulnerable 2025-05-07T19:43:00.8806350Z Vulnerability Spec rstack overflow: Not affected 2025-05-07T19:43:00.8806756Z Vulnerability Spec store bypass: Vulnerable 2025-05-07T19:43:00.8807324Z Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization 2025-05-07T19:43:00.8808188Z Vulnerability Spectre v2: Mitigation; Retpolines; STIBP disabled; RSB filling; PBRSB-eIBRS Not affected; BHI Retpoline 2025-05-07T19:43:00.8809143Z Vulnerability Srbds: Not affected 2025-05-07T19:43:00.8809558Z Vulnerability Tsx async abort: Not affected 2025-05-07T19:43:00.8809813Z 2025-05-07T19:43:00.8809935Z + cat /proc/cpuinfo 2025-05-07T19:43:00.8810076Z 2025-05-07T19:43:00.8810425Z processor : 0 2025-05-07T19:43:00.8810683Z vendor_id : GenuineIntel 2025-05-07T19:43:00.8810943Z cpu family : 6 2025-05-07T19:43:00.8811184Z model : 85 2025-05-07T19:43:00.8811634Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.8812019Z stepping : 7 2025-05-07T19:43:00.8812257Z microcode : 0x5003901 2025-05-07T19:43:00.8812494Z cpu MHz : 3236.940 2025-05-07T19:43:00.8812768Z cache size : 36608 KB 2025-05-07T19:43:00.8813008Z physical id : 0 2025-05-07T19:43:00.8813253Z siblings : 48 2025-05-07T19:43:00.8813466Z core id : 0 2025-05-07T19:43:00.8813697Z cpu cores : 24 2025-05-07T19:43:00.8813910Z apicid : 0 2025-05-07T19:43:00.8814141Z initial apicid : 0 2025-05-07T19:43:00.8814366Z fpu : yes 2025-05-07T19:43:00.8814603Z fpu_exception : yes 2025-05-07T19:43:00.8814830Z cpuid level : 13 2025-05-07T19:43:00.8815070Z wp : yes 2025-05-07T19:43:00.8817385Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.8820044Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.8820643Z bogomips : 5999.99 2025-05-07T19:43:00.8820907Z clflush size : 64 2025-05-07T19:43:00.8821158Z cache_alignment : 64 2025-05-07T19:43:00.8821461Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.8821910Z power management: 2025-05-07T19:43:00.8822076Z 2025-05-07T19:43:00.8822172Z processor : 1 2025-05-07T19:43:00.8822410Z vendor_id : GenuineIntel 2025-05-07T19:43:00.8822690Z cpu family : 6 2025-05-07T19:43:00.8822931Z model : 85 2025-05-07T19:43:00.8823222Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.8823731Z stepping : 7 2025-05-07T19:43:00.8823954Z microcode : 0x5003901 2025-05-07T19:43:00.8824215Z cpu MHz : 3254.202 2025-05-07T19:43:00.8824518Z cache size : 36608 KB 2025-05-07T19:43:00.8824764Z physical id : 0 2025-05-07T19:43:00.8824981Z siblings : 48 2025-05-07T19:43:00.8825231Z core id : 1 2025-05-07T19:43:00.8825439Z cpu cores : 24 2025-05-07T19:43:00.8825667Z apicid : 2 2025-05-07T19:43:00.8825872Z initial apicid : 2 2025-05-07T19:43:00.8826113Z fpu : yes 2025-05-07T19:43:00.8826323Z fpu_exception : yes 2025-05-07T19:43:00.8826564Z cpuid level : 13 2025-05-07T19:43:00.8826798Z wp : yes 2025-05-07T19:43:00.8829060Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.8831699Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.8832302Z bogomips : 5999.99 2025-05-07T19:43:00.8832523Z clflush size : 64 2025-05-07T19:43:00.8832765Z cache_alignment : 64 2025-05-07T19:43:00.8833040Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.8833387Z power management: 2025-05-07T19:43:00.8833524Z 2025-05-07T19:43:00.8833617Z processor : 2 2025-05-07T19:43:00.8833856Z vendor_id : GenuineIntel 2025-05-07T19:43:00.8834104Z cpu family : 6 2025-05-07T19:43:00.8834333Z model : 85 2025-05-07T19:43:00.8834638Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.8835093Z stepping : 7 2025-05-07T19:43:00.8835333Z microcode : 0x5003901 2025-05-07T19:43:00.8835567Z cpu MHz : 3159.214 2025-05-07T19:43:00.8835812Z cache size : 36608 KB 2025-05-07T19:43:00.8836045Z physical id : 0 2025-05-07T19:43:00.8836284Z siblings : 48 2025-05-07T19:43:00.8836493Z core id : 2 2025-05-07T19:43:00.8836716Z cpu cores : 24 2025-05-07T19:43:00.8836925Z apicid : 4 2025-05-07T19:43:00.8837145Z initial apicid : 4 2025-05-07T19:43:00.8837362Z fpu : yes 2025-05-07T19:43:00.8837582Z fpu_exception : yes 2025-05-07T19:43:00.8837819Z cpuid level : 13 2025-05-07T19:43:00.8838033Z wp : yes 2025-05-07T19:43:00.8840404Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.8842964Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.8843537Z bogomips : 5999.99 2025-05-07T19:43:00.8843771Z clflush size : 64 2025-05-07T19:43:00.8843991Z cache_alignment : 64 2025-05-07T19:43:00.8844275Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.8844596Z power management: 2025-05-07T19:43:00.8844745Z 2025-05-07T19:43:00.8844831Z processor : 3 2025-05-07T19:43:00.8845124Z vendor_id : GenuineIntel 2025-05-07T19:43:00.8845396Z cpu family : 6 2025-05-07T19:43:00.8845623Z model : 85 2025-05-07T19:43:00.8845903Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.8846277Z stepping : 7 2025-05-07T19:43:00.8846484Z microcode : 0x5003901 2025-05-07T19:43:00.8846733Z cpu MHz : 3184.510 2025-05-07T19:43:00.8846952Z cache size : 36608 KB 2025-05-07T19:43:00.8847192Z physical id : 0 2025-05-07T19:43:00.8847405Z siblings : 48 2025-05-07T19:43:00.8847629Z core id : 3 2025-05-07T19:43:00.8847836Z cpu cores : 24 2025-05-07T19:43:00.8848067Z apicid : 6 2025-05-07T19:43:00.8848269Z initial apicid : 6 2025-05-07T19:43:00.8848507Z fpu : yes 2025-05-07T19:43:00.8848732Z fpu_exception : yes 2025-05-07T19:43:00.8848960Z cpuid level : 13 2025-05-07T19:43:00.8849197Z wp : yes 2025-05-07T19:43:00.8851391Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.8853981Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.8854575Z bogomips : 5999.99 2025-05-07T19:43:00.8854803Z clflush size : 64 2025-05-07T19:43:00.8855053Z cache_alignment : 64 2025-05-07T19:43:00.8855328Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.8855681Z power management: 2025-05-07T19:43:00.8855815Z 2025-05-07T19:43:00.8855907Z processor : 4 2025-05-07T19:43:00.8856148Z vendor_id : GenuineIntel 2025-05-07T19:43:00.8856399Z cpu family : 6 2025-05-07T19:43:00.8856634Z model : 85 2025-05-07T19:43:00.8856934Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.8857290Z stepping : 7 2025-05-07T19:43:00.8857529Z microcode : 0x5003901 2025-05-07T19:43:00.8857760Z cpu MHz : 3191.635 2025-05-07T19:43:00.8858084Z cache size : 36608 KB 2025-05-07T19:43:00.8858314Z physical id : 0 2025-05-07T19:43:00.8858558Z siblings : 48 2025-05-07T19:43:00.8858769Z core id : 4 2025-05-07T19:43:00.8858993Z cpu cores : 24 2025-05-07T19:43:00.8859198Z apicid : 8 2025-05-07T19:43:00.8859422Z initial apicid : 8 2025-05-07T19:43:00.8859636Z fpu : yes 2025-05-07T19:43:00.8859854Z fpu_exception : yes 2025-05-07T19:43:00.8860093Z cpuid level : 13 2025-05-07T19:43:00.8860301Z wp : yes 2025-05-07T19:43:00.8862507Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.8865440Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.8866028Z bogomips : 5999.99 2025-05-07T19:43:00.8866268Z clflush size : 64 2025-05-07T19:43:00.8866488Z cache_alignment : 64 2025-05-07T19:43:00.8866785Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.8867118Z power management: 2025-05-07T19:43:00.8867277Z 2025-05-07T19:43:00.8867367Z processor : 5 2025-05-07T19:43:00.8867594Z vendor_id : GenuineIntel 2025-05-07T19:43:00.8867865Z cpu family : 6 2025-05-07T19:43:00.8868207Z model : 85 2025-05-07T19:43:00.8868588Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.8868968Z stepping : 7 2025-05-07T19:43:00.8869183Z microcode : 0x5003901 2025-05-07T19:43:00.8869436Z cpu MHz : 3219.048 2025-05-07T19:43:00.8869661Z cache size : 36608 KB 2025-05-07T19:43:00.8870034Z physical id : 0 2025-05-07T19:43:00.8870248Z siblings : 48 2025-05-07T19:43:00.8870472Z core id : 5 2025-05-07T19:43:00.8870676Z cpu cores : 24 2025-05-07T19:43:00.8870904Z apicid : 10 2025-05-07T19:43:00.8871112Z initial apicid : 10 2025-05-07T19:43:00.8871349Z fpu : yes 2025-05-07T19:43:00.8871570Z fpu_exception : yes 2025-05-07T19:43:00.8871789Z cpuid level : 13 2025-05-07T19:43:00.8872019Z wp : yes 2025-05-07T19:43:00.8874208Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.8876766Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.8877356Z bogomips : 5999.99 2025-05-07T19:43:00.8877577Z clflush size : 64 2025-05-07T19:43:00.8877818Z cache_alignment : 64 2025-05-07T19:43:00.8878092Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.8878437Z power management: 2025-05-07T19:43:00.8878570Z 2025-05-07T19:43:00.8878657Z processor : 6 2025-05-07T19:43:00.8878895Z vendor_id : GenuineIntel 2025-05-07T19:43:00.8879157Z cpu family : 6 2025-05-07T19:43:00.8879364Z model : 85 2025-05-07T19:43:00.8879662Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.8880012Z stepping : 7 2025-05-07T19:43:00.8880240Z microcode : 0x5003901 2025-05-07T19:43:00.8880469Z cpu MHz : 3238.136 2025-05-07T19:43:00.8880707Z cache size : 36608 KB 2025-05-07T19:43:00.8880935Z physical id : 0 2025-05-07T19:43:00.8881241Z siblings : 48 2025-05-07T19:43:00.8881446Z core id : 6 2025-05-07T19:43:00.8881669Z cpu cores : 24 2025-05-07T19:43:00.8881872Z apicid : 12 2025-05-07T19:43:00.8882096Z initial apicid : 12 2025-05-07T19:43:00.8882310Z fpu : yes 2025-05-07T19:43:00.8882536Z fpu_exception : yes 2025-05-07T19:43:00.8882769Z cpuid level : 13 2025-05-07T19:43:00.8882975Z wp : yes 2025-05-07T19:43:00.8885184Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.8887740Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.8888313Z bogomips : 5999.99 2025-05-07T19:43:00.8888552Z clflush size : 64 2025-05-07T19:43:00.8888781Z cache_alignment : 64 2025-05-07T19:43:00.8889082Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.8889414Z power management: 2025-05-07T19:43:00.8889573Z 2025-05-07T19:43:00.8889661Z processor : 7 2025-05-07T19:43:00.8889887Z vendor_id : GenuineIntel 2025-05-07T19:43:00.8890144Z cpu family : 6 2025-05-07T19:43:00.8890365Z model : 85 2025-05-07T19:43:00.8890637Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.8891074Z stepping : 7 2025-05-07T19:43:00.8891288Z microcode : 0x5003901 2025-05-07T19:43:00.8891547Z cpu MHz : 3252.040 2025-05-07T19:43:00.8891769Z cache size : 36608 KB 2025-05-07T19:43:00.8892023Z physical id : 0 2025-05-07T19:43:00.8892234Z siblings : 48 2025-05-07T19:43:00.8892462Z core id : 7 2025-05-07T19:43:00.8892665Z cpu cores : 24 2025-05-07T19:43:00.8892892Z apicid : 14 2025-05-07T19:43:00.8893101Z initial apicid : 14 2025-05-07T19:43:00.8893347Z fpu : yes 2025-05-07T19:43:00.8893568Z fpu_exception : yes 2025-05-07T19:43:00.8893790Z cpuid level : 13 2025-05-07T19:43:00.8894024Z wp : yes 2025-05-07T19:43:00.8896221Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.8898780Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.8899385Z bogomips : 5999.99 2025-05-07T19:43:00.8899610Z clflush size : 64 2025-05-07T19:43:00.8899858Z cache_alignment : 64 2025-05-07T19:43:00.8900129Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.8900483Z power management: 2025-05-07T19:43:00.8900623Z 2025-05-07T19:43:00.8900716Z processor : 8 2025-05-07T19:43:00.8900962Z vendor_id : GenuineIntel 2025-05-07T19:43:00.8901230Z cpu family : 6 2025-05-07T19:43:00.8901443Z model : 85 2025-05-07T19:43:00.8901743Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.8902093Z stepping : 7 2025-05-07T19:43:00.8902336Z microcode : 0x5003901 2025-05-07T19:43:00.8902559Z cpu MHz : 3193.462 2025-05-07T19:43:00.8902791Z cache size : 36608 KB 2025-05-07T19:43:00.8903015Z physical id : 0 2025-05-07T19:43:00.8903327Z siblings : 48 2025-05-07T19:43:00.8903551Z core id : 8 2025-05-07T19:43:00.8903949Z cpu cores : 24 2025-05-07T19:43:00.8904245Z apicid : 16 2025-05-07T19:43:00.8904686Z initial apicid : 16 2025-05-07T19:43:00.8904916Z fpu : yes 2025-05-07T19:43:00.8905116Z fpu_exception : yes 2025-05-07T19:43:00.8905352Z cpuid level : 13 2025-05-07T19:43:00.8905561Z wp : yes 2025-05-07T19:43:00.8907828Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.8910671Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.8911263Z bogomips : 5999.99 2025-05-07T19:43:00.8911505Z clflush size : 64 2025-05-07T19:43:00.8911731Z cache_alignment : 64 2025-05-07T19:43:00.8912021Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.8912351Z power management: 2025-05-07T19:43:00.8912498Z 2025-05-07T19:43:00.8912587Z processor : 9 2025-05-07T19:43:00.8912804Z vendor_id : GenuineIntel 2025-05-07T19:43:00.8913061Z cpu family : 6 2025-05-07T19:43:00.8913281Z model : 85 2025-05-07T19:43:00.8913557Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.8913927Z stepping : 7 2025-05-07T19:43:00.8914140Z microcode : 0x5003901 2025-05-07T19:43:00.8914500Z cpu MHz : 2999.996 2025-05-07T19:43:00.8914728Z cache size : 36608 KB 2025-05-07T19:43:00.8914979Z physical id : 0 2025-05-07T19:43:00.8915197Z siblings : 48 2025-05-07T19:43:00.8915419Z core id : 9 2025-05-07T19:43:00.8915626Z cpu cores : 24 2025-05-07T19:43:00.8915851Z apicid : 18 2025-05-07T19:43:00.8916064Z initial apicid : 18 2025-05-07T19:43:00.8916305Z fpu : yes 2025-05-07T19:43:00.8916523Z fpu_exception : yes 2025-05-07T19:43:00.8916745Z cpuid level : 13 2025-05-07T19:43:00.8916976Z wp : yes 2025-05-07T19:43:00.8919223Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.8921885Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.8922447Z bogomips : 5999.99 2025-05-07T19:43:00.8922656Z clflush size : 64 2025-05-07T19:43:00.8922880Z cache_alignment : 64 2025-05-07T19:43:00.8923137Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.8923460Z power management: 2025-05-07T19:43:00.8923586Z 2025-05-07T19:43:00.8923675Z processor : 10 2025-05-07T19:43:00.8923896Z vendor_id : GenuineIntel 2025-05-07T19:43:00.8924141Z cpu family : 6 2025-05-07T19:43:00.8924332Z model : 85 2025-05-07T19:43:00.8924604Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.8924932Z stepping : 7 2025-05-07T19:43:00.8925146Z microcode : 0x5003901 2025-05-07T19:43:00.8925364Z cpu MHz : 3228.925 2025-05-07T19:43:00.8925589Z cache size : 36608 KB 2025-05-07T19:43:00.8925799Z physical id : 0 2025-05-07T19:43:00.8926012Z siblings : 48 2025-05-07T19:43:00.8926202Z core id : 10 2025-05-07T19:43:00.8926408Z cpu cores : 24 2025-05-07T19:43:00.8926601Z apicid : 20 2025-05-07T19:43:00.8926810Z initial apicid : 20 2025-05-07T19:43:00.8927120Z fpu : yes 2025-05-07T19:43:00.8927309Z fpu_exception : yes 2025-05-07T19:43:00.8927532Z cpuid level : 13 2025-05-07T19:43:00.8927735Z wp : yes 2025-05-07T19:43:00.8929833Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.8932259Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.8932800Z bogomips : 5999.99 2025-05-07T19:43:00.8933023Z clflush size : 64 2025-05-07T19:43:00.8933237Z cache_alignment : 64 2025-05-07T19:43:00.8933532Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.8933863Z power management: 2025-05-07T19:43:00.8934023Z 2025-05-07T19:43:00.8934114Z processor : 11 2025-05-07T19:43:00.8934374Z vendor_id : GenuineIntel 2025-05-07T19:43:00.8934617Z cpu family : 6 2025-05-07T19:43:00.8934856Z model : 85 2025-05-07T19:43:00.8935131Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.8935513Z stepping : 7 2025-05-07T19:43:00.8935734Z microcode : 0x5003901 2025-05-07T19:43:00.8936002Z cpu MHz : 2999.996 2025-05-07T19:43:00.8936231Z cache size : 36608 KB 2025-05-07T19:43:00.8936553Z physical id : 0 2025-05-07T19:43:00.8936775Z siblings : 48 2025-05-07T19:43:00.8937025Z core id : 11 2025-05-07T19:43:00.8937241Z cpu cores : 24 2025-05-07T19:43:00.8937497Z apicid : 22 2025-05-07T19:43:00.8937719Z initial apicid : 22 2025-05-07T19:43:00.8937979Z fpu : yes 2025-05-07T19:43:00.8938234Z fpu_exception : yes 2025-05-07T19:43:00.8938463Z cpuid level : 13 2025-05-07T19:43:00.8938716Z wp : yes 2025-05-07T19:43:00.8940821Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.8943589Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.8944396Z bogomips : 5999.99 2025-05-07T19:43:00.8944693Z clflush size : 64 2025-05-07T19:43:00.8944957Z cache_alignment : 64 2025-05-07T19:43:00.8945240Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.8945599Z power management: 2025-05-07T19:43:00.8945741Z 2025-05-07T19:43:00.8945836Z processor : 12 2025-05-07T19:43:00.8946085Z vendor_id : GenuineIntel 2025-05-07T19:43:00.8946350Z cpu family : 6 2025-05-07T19:43:00.8946555Z model : 85 2025-05-07T19:43:00.8946846Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.8947195Z stepping : 7 2025-05-07T19:43:00.8947428Z microcode : 0x5003901 2025-05-07T19:43:00.8947656Z cpu MHz : 3249.212 2025-05-07T19:43:00.8947904Z cache size : 36608 KB 2025-05-07T19:43:00.8948146Z physical id : 0 2025-05-07T19:43:00.8948387Z siblings : 48 2025-05-07T19:43:00.8948610Z core id : 12 2025-05-07T19:43:00.8948857Z cpu cores : 24 2025-05-07T19:43:00.8949082Z apicid : 24 2025-05-07T19:43:00.8949334Z initial apicid : 24 2025-05-07T19:43:00.8949596Z fpu : yes 2025-05-07T19:43:00.8949823Z fpu_exception : yes 2025-05-07T19:43:00.8950188Z cpuid level : 13 2025-05-07T19:43:00.8950430Z wp : yes 2025-05-07T19:43:00.8952731Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.8955390Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.8955996Z bogomips : 5999.99 2025-05-07T19:43:00.8956254Z clflush size : 64 2025-05-07T19:43:00.8956489Z cache_alignment : 64 2025-05-07T19:43:00.8956783Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.8957173Z power management: 2025-05-07T19:43:00.8957321Z 2025-05-07T19:43:00.8957408Z processor : 13 2025-05-07T19:43:00.8957646Z vendor_id : GenuineIntel 2025-05-07T19:43:00.8957890Z cpu family : 6 2025-05-07T19:43:00.8958112Z model : 85 2025-05-07T19:43:00.8958391Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.8958761Z stepping : 7 2025-05-07T19:43:00.8958971Z microcode : 0x5003901 2025-05-07T19:43:00.8959217Z cpu MHz : 3167.716 2025-05-07T19:43:00.8959437Z cache size : 36608 KB 2025-05-07T19:43:00.8959680Z physical id : 0 2025-05-07T19:43:00.8959894Z siblings : 48 2025-05-07T19:43:00.8960115Z core id : 13 2025-05-07T19:43:00.8960391Z cpu cores : 24 2025-05-07T19:43:00.8960642Z apicid : 26 2025-05-07T19:43:00.8960870Z initial apicid : 26 2025-05-07T19:43:00.8961130Z fpu : yes 2025-05-07T19:43:00.8961386Z fpu_exception : yes 2025-05-07T19:43:00.8961627Z cpuid level : 13 2025-05-07T19:43:00.8961883Z wp : yes 2025-05-07T19:43:00.8964140Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.8966794Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.8967423Z bogomips : 5999.99 2025-05-07T19:43:00.8967661Z clflush size : 64 2025-05-07T19:43:00.8967932Z cache_alignment : 64 2025-05-07T19:43:00.8968222Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.8968595Z power management: 2025-05-07T19:43:00.8968740Z 2025-05-07T19:43:00.8968838Z processor : 14 2025-05-07T19:43:00.8969217Z vendor_id : GenuineIntel 2025-05-07T19:43:00.8969495Z cpu family : 6 2025-05-07T19:43:00.8969714Z model : 85 2025-05-07T19:43:00.8970022Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.8970388Z stepping : 7 2025-05-07T19:43:00.8970638Z microcode : 0x5003901 2025-05-07T19:43:00.8970880Z cpu MHz : 3240.287 2025-05-07T19:43:00.8971137Z cache size : 36608 KB 2025-05-07T19:43:00.8971379Z physical id : 0 2025-05-07T19:43:00.8971626Z siblings : 48 2025-05-07T19:43:00.8971844Z core id : 14 2025-05-07T19:43:00.8972089Z cpu cores : 24 2025-05-07T19:43:00.8972317Z apicid : 28 2025-05-07T19:43:00.8972560Z initial apicid : 28 2025-05-07T19:43:00.8972823Z fpu : yes 2025-05-07T19:43:00.8973051Z fpu_exception : yes 2025-05-07T19:43:00.8973309Z cpuid level : 13 2025-05-07T19:43:00.8973543Z wp : yes 2025-05-07T19:43:00.8975804Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.8978333Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.8978889Z bogomips : 5999.99 2025-05-07T19:43:00.8979143Z clflush size : 64 2025-05-07T19:43:00.8979552Z cache_alignment : 64 2025-05-07T19:43:00.8979952Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.8980293Z power management: 2025-05-07T19:43:00.8980466Z 2025-05-07T19:43:00.8980568Z processor : 15 2025-05-07T19:43:00.8980834Z vendor_id : GenuineIntel 2025-05-07T19:43:00.8981098Z cpu family : 6 2025-05-07T19:43:00.8981357Z model : 85 2025-05-07T19:43:00.8981654Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.8982053Z stepping : 7 2025-05-07T19:43:00.8982288Z microcode : 0x5003901 2025-05-07T19:43:00.8982569Z cpu MHz : 2999.996 2025-05-07T19:43:00.8982815Z cache size : 36608 KB 2025-05-07T19:43:00.8983096Z physical id : 0 2025-05-07T19:43:00.8983422Z siblings : 48 2025-05-07T19:43:00.8983682Z core id : 15 2025-05-07T19:43:00.8984087Z cpu cores : 24 2025-05-07T19:43:00.8984360Z apicid : 30 2025-05-07T19:43:00.8984728Z initial apicid : 30 2025-05-07T19:43:00.8984998Z fpu : yes 2025-05-07T19:43:00.8985252Z fpu_exception : yes 2025-05-07T19:43:00.8985505Z cpuid level : 13 2025-05-07T19:43:00.8985773Z wp : yes 2025-05-07T19:43:00.8988046Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.8990710Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.8991349Z bogomips : 5999.99 2025-05-07T19:43:00.8991598Z clflush size : 64 2025-05-07T19:43:00.8991859Z cache_alignment : 64 2025-05-07T19:43:00.8992152Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.8992526Z power management: 2025-05-07T19:43:00.8992667Z 2025-05-07T19:43:00.8992767Z processor : 16 2025-05-07T19:43:00.8993029Z vendor_id : GenuineIntel 2025-05-07T19:43:00.8993316Z cpu family : 6 2025-05-07T19:43:00.8993539Z model : 85 2025-05-07T19:43:00.8993854Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.8994221Z stepping : 7 2025-05-07T19:43:00.8994473Z microcode : 0x5003901 2025-05-07T19:43:00.8994722Z cpu MHz : 3192.340 2025-05-07T19:43:00.8994986Z cache size : 36608 KB 2025-05-07T19:43:00.8995231Z physical id : 0 2025-05-07T19:43:00.8995485Z siblings : 48 2025-05-07T19:43:00.8995715Z core id : 16 2025-05-07T19:43:00.8995976Z cpu cores : 24 2025-05-07T19:43:00.8996333Z apicid : 32 2025-05-07T19:43:00.8996581Z initial apicid : 32 2025-05-07T19:43:00.8996836Z fpu : yes 2025-05-07T19:43:00.8997054Z fpu_exception : yes 2025-05-07T19:43:00.8997308Z cpuid level : 13 2025-05-07T19:43:00.8997529Z wp : yes 2025-05-07T19:43:00.8999754Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.9002631Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.9003229Z bogomips : 5999.99 2025-05-07T19:43:00.9003500Z clflush size : 64 2025-05-07T19:43:00.9003749Z cache_alignment : 64 2025-05-07T19:43:00.9004079Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.9004433Z power management: 2025-05-07T19:43:00.9004604Z 2025-05-07T19:43:00.9004707Z processor : 17 2025-05-07T19:43:00.9004976Z vendor_id : GenuineIntel 2025-05-07T19:43:00.9005240Z cpu family : 6 2025-05-07T19:43:00.9005650Z model : 85 2025-05-07T19:43:00.9005942Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.9006338Z stepping : 7 2025-05-07T19:43:00.9006563Z microcode : 0x5003901 2025-05-07T19:43:00.9006834Z cpu MHz : 3198.697 2025-05-07T19:43:00.9007075Z cache size : 36608 KB 2025-05-07T19:43:00.9007349Z physical id : 0 2025-05-07T19:43:00.9007581Z siblings : 48 2025-05-07T19:43:00.9007828Z core id : 17 2025-05-07T19:43:00.9008049Z cpu cores : 24 2025-05-07T19:43:00.9008302Z apicid : 34 2025-05-07T19:43:00.9008559Z initial apicid : 34 2025-05-07T19:43:00.9008947Z fpu : yes 2025-05-07T19:43:00.9009274Z fpu_exception : yes 2025-05-07T19:43:00.9009611Z cpuid level : 13 2025-05-07T19:43:00.9009866Z wp : yes 2025-05-07T19:43:00.9012124Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.9014767Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.9015388Z bogomips : 5999.99 2025-05-07T19:43:00.9015625Z clflush size : 64 2025-05-07T19:43:00.9015891Z cache_alignment : 64 2025-05-07T19:43:00.9016178Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.9016544Z power management: 2025-05-07T19:43:00.9016689Z 2025-05-07T19:43:00.9016785Z processor : 18 2025-05-07T19:43:00.9017043Z vendor_id : GenuineIntel 2025-05-07T19:43:00.9017327Z cpu family : 6 2025-05-07T19:43:00.9017549Z model : 85 2025-05-07T19:43:00.9017870Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.9018240Z stepping : 7 2025-05-07T19:43:00.9018493Z microcode : 0x5003901 2025-05-07T19:43:00.9018741Z cpu MHz : 3764.216 2025-05-07T19:43:00.9019006Z cache size : 36608 KB 2025-05-07T19:43:00.9019251Z physical id : 0 2025-05-07T19:43:00.9019520Z siblings : 48 2025-05-07T19:43:00.9019756Z core id : 18 2025-05-07T19:43:00.9020008Z cpu cores : 24 2025-05-07T19:43:00.9020231Z apicid : 36 2025-05-07T19:43:00.9020470Z initial apicid : 36 2025-05-07T19:43:00.9020699Z fpu : yes 2025-05-07T19:43:00.9020901Z fpu_exception : yes 2025-05-07T19:43:00.9021142Z cpuid level : 13 2025-05-07T19:43:00.9021353Z wp : yes 2025-05-07T19:43:00.9023683Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.9026429Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.9027015Z bogomips : 5999.99 2025-05-07T19:43:00.9027298Z clflush size : 64 2025-05-07T19:43:00.9027610Z cache_alignment : 64 2025-05-07T19:43:00.9027921Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.9028253Z power management: 2025-05-07T19:43:00.9028408Z 2025-05-07T19:43:00.9028499Z processor : 19 2025-05-07T19:43:00.9028741Z vendor_id : GenuineIntel 2025-05-07T19:43:00.9028986Z cpu family : 6 2025-05-07T19:43:00.9029214Z model : 85 2025-05-07T19:43:00.9029496Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.9029890Z stepping : 7 2025-05-07T19:43:00.9030104Z microcode : 0x5003901 2025-05-07T19:43:00.9054868Z cpu MHz : 3212.309 2025-05-07T19:43:00.9055209Z cache size : 36608 KB 2025-05-07T19:43:00.9055516Z physical id : 0 2025-05-07T19:43:00.9055735Z siblings : 48 2025-05-07T19:43:00.9055990Z core id : 19 2025-05-07T19:43:00.9056186Z cpu cores : 24 2025-05-07T19:43:00.9056569Z apicid : 38 2025-05-07T19:43:00.9057103Z initial apicid : 38 2025-05-07T19:43:00.9057346Z fpu : yes 2025-05-07T19:43:00.9057567Z fpu_exception : yes 2025-05-07T19:43:00.9057791Z cpuid level : 13 2025-05-07T19:43:00.9058020Z wp : yes 2025-05-07T19:43:00.9060529Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.9063162Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.9063892Z bogomips : 5999.99 2025-05-07T19:43:00.9064116Z clflush size : 64 2025-05-07T19:43:00.9064412Z cache_alignment : 64 2025-05-07T19:43:00.9064691Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.9065028Z power management: 2025-05-07T19:43:00.9065168Z 2025-05-07T19:43:00.9065275Z processor : 20 2025-05-07T19:43:00.9065497Z vendor_id : GenuineIntel 2025-05-07T19:43:00.9065753Z cpu family : 6 2025-05-07T19:43:00.9065958Z model : 85 2025-05-07T19:43:00.9066250Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.9066608Z stepping : 7 2025-05-07T19:43:00.9066833Z microcode : 0x5003901 2025-05-07T19:43:00.9067062Z cpu MHz : 3193.067 2025-05-07T19:43:00.9067294Z cache size : 36608 KB 2025-05-07T19:43:00.9067538Z physical id : 0 2025-05-07T19:43:00.9067747Z siblings : 48 2025-05-07T19:43:00.9067961Z core id : 20 2025-05-07T19:43:00.9068163Z cpu cores : 24 2025-05-07T19:43:00.9068361Z apicid : 40 2025-05-07T19:43:00.9068568Z initial apicid : 40 2025-05-07T19:43:00.9068778Z fpu : yes 2025-05-07T19:43:00.9068966Z fpu_exception : yes 2025-05-07T19:43:00.9069205Z cpuid level : 13 2025-05-07T19:43:00.9069415Z wp : yes 2025-05-07T19:43:00.9071667Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.9074420Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.9074999Z bogomips : 5999.99 2025-05-07T19:43:00.9075226Z clflush size : 64 2025-05-07T19:43:00.9075439Z cache_alignment : 64 2025-05-07T19:43:00.9075706Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.9076143Z power management: 2025-05-07T19:43:00.9076290Z 2025-05-07T19:43:00.9076370Z processor : 21 2025-05-07T19:43:00.9076580Z vendor_id : GenuineIntel 2025-05-07T19:43:00.9076816Z cpu family : 6 2025-05-07T19:43:00.9077006Z model : 85 2025-05-07T19:43:00.9077277Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.9077638Z stepping : 7 2025-05-07T19:43:00.9077835Z microcode : 0x5003901 2025-05-07T19:43:00.9078063Z cpu MHz : 3181.945 2025-05-07T19:43:00.9078268Z cache size : 36608 KB 2025-05-07T19:43:00.9078507Z physical id : 0 2025-05-07T19:43:00.9078701Z siblings : 48 2025-05-07T19:43:00.9078906Z core id : 21 2025-05-07T19:43:00.9079097Z cpu cores : 24 2025-05-07T19:43:00.9079304Z apicid : 42 2025-05-07T19:43:00.9079500Z initial apicid : 42 2025-05-07T19:43:00.9079715Z fpu : yes 2025-05-07T19:43:00.9079900Z fpu_exception : yes 2025-05-07T19:43:00.9080113Z cpuid level : 13 2025-05-07T19:43:00.9080329Z wp : yes 2025-05-07T19:43:00.9082568Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.9085110Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.9085682Z bogomips : 5999.99 2025-05-07T19:43:00.9085887Z clflush size : 64 2025-05-07T19:43:00.9086102Z cache_alignment : 64 2025-05-07T19:43:00.9086361Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.9086684Z power management: 2025-05-07T19:43:00.9086812Z 2025-05-07T19:43:00.9086896Z processor : 22 2025-05-07T19:43:00.9087114Z vendor_id : GenuineIntel 2025-05-07T19:43:00.9087340Z cpu family : 6 2025-05-07T19:43:00.9087542Z model : 85 2025-05-07T19:43:00.9087810Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.9088144Z stepping : 7 2025-05-07T19:43:00.9088355Z microcode : 0x5003901 2025-05-07T19:43:00.9088570Z cpu MHz : 3206.824 2025-05-07T19:43:00.9088786Z cache size : 36608 KB 2025-05-07T19:43:00.9088998Z physical id : 0 2025-05-07T19:43:00.9089214Z siblings : 48 2025-05-07T19:43:00.9089398Z core id : 22 2025-05-07T19:43:00.9089602Z cpu cores : 24 2025-05-07T19:43:00.9089794Z apicid : 44 2025-05-07T19:43:00.9089999Z initial apicid : 44 2025-05-07T19:43:00.9090201Z fpu : yes 2025-05-07T19:43:00.9090394Z fpu_exception : yes 2025-05-07T19:43:00.9090607Z cpuid level : 13 2025-05-07T19:43:00.9090808Z wp : yes 2025-05-07T19:43:00.9092999Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.9095602Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.9096163Z bogomips : 5999.99 2025-05-07T19:43:00.9096383Z clflush size : 64 2025-05-07T19:43:00.9096595Z cache_alignment : 64 2025-05-07T19:43:00.9096875Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.9097190Z power management: 2025-05-07T19:43:00.9097335Z 2025-05-07T19:43:00.9097418Z processor : 23 2025-05-07T19:43:00.9097631Z vendor_id : GenuineIntel 2025-05-07T19:43:00.9097872Z cpu family : 6 2025-05-07T19:43:00.9098078Z model : 85 2025-05-07T19:43:00.9098343Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.9098692Z stepping : 7 2025-05-07T19:43:00.9098887Z microcode : 0x5003901 2025-05-07T19:43:00.9099123Z cpu MHz : 2999.996 2025-05-07T19:43:00.9099335Z cache size : 36608 KB 2025-05-07T19:43:00.9099566Z physical id : 0 2025-05-07T19:43:00.9099772Z siblings : 48 2025-05-07T19:43:00.9099976Z core id : 23 2025-05-07T19:43:00.9100169Z cpu cores : 24 2025-05-07T19:43:00.9100380Z apicid : 46 2025-05-07T19:43:00.9100574Z initial apicid : 46 2025-05-07T19:43:00.9100792Z fpu : yes 2025-05-07T19:43:00.9100989Z fpu_exception : yes 2025-05-07T19:43:00.9101208Z cpuid level : 13 2025-05-07T19:43:00.9101421Z wp : yes 2025-05-07T19:43:00.9103946Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.9106575Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.9107169Z bogomips : 5999.99 2025-05-07T19:43:00.9107381Z clflush size : 64 2025-05-07T19:43:00.9107612Z cache_alignment : 64 2025-05-07T19:43:00.9107879Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.9108206Z power management: 2025-05-07T19:43:00.9108339Z 2025-05-07T19:43:00.9108417Z processor : 24 2025-05-07T19:43:00.9108848Z vendor_id : GenuineIntel 2025-05-07T19:43:00.9109299Z cpu family : 6 2025-05-07T19:43:00.9109504Z model : 85 2025-05-07T19:43:00.9109779Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.9110110Z stepping : 7 2025-05-07T19:43:00.9110324Z microcode : 0x5003901 2025-05-07T19:43:00.9110542Z cpu MHz : 2999.996 2025-05-07T19:43:00.9110763Z cache size : 36608 KB 2025-05-07T19:43:00.9110985Z physical id : 1 2025-05-07T19:43:00.9111214Z siblings : 48 2025-05-07T19:43:00.9111414Z core id : 0 2025-05-07T19:43:00.9111614Z cpu cores : 24 2025-05-07T19:43:00.9111813Z apicid : 64 2025-05-07T19:43:00.9112018Z initial apicid : 64 2025-05-07T19:43:00.9112225Z fpu : yes 2025-05-07T19:43:00.9112429Z fpu_exception : yes 2025-05-07T19:43:00.9112657Z cpuid level : 13 2025-05-07T19:43:00.9112860Z wp : yes 2025-05-07T19:43:00.9115109Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.9120369Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.9121049Z bogomips : 5999.99 2025-05-07T19:43:00.9121395Z clflush size : 64 2025-05-07T19:43:00.9121594Z cache_alignment : 64 2025-05-07T19:43:00.9121859Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.9122157Z power management: 2025-05-07T19:43:00.9122295Z 2025-05-07T19:43:00.9122373Z processor : 25 2025-05-07T19:43:00.9122571Z vendor_id : GenuineIntel 2025-05-07T19:43:00.9122804Z cpu family : 6 2025-05-07T19:43:00.9123001Z model : 85 2025-05-07T19:43:00.9123267Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.9123594Z stepping : 7 2025-05-07T19:43:00.9123801Z microcode : 0x5003901 2025-05-07T19:43:00.9124017Z cpu MHz : 1554.881 2025-05-07T19:43:00.9124229Z cache size : 36608 KB 2025-05-07T19:43:00.9124451Z physical id : 1 2025-05-07T19:43:00.9124663Z siblings : 48 2025-05-07T19:43:00.9124850Z core id : 1 2025-05-07T19:43:00.9125049Z cpu cores : 24 2025-05-07T19:43:00.9125250Z apicid : 66 2025-05-07T19:43:00.9125442Z initial apicid : 66 2025-05-07T19:43:00.9125659Z fpu : yes 2025-05-07T19:43:00.9125848Z fpu_exception : yes 2025-05-07T19:43:00.9126063Z cpuid level : 13 2025-05-07T19:43:00.9126253Z wp : yes 2025-05-07T19:43:00.9128429Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.9130848Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.9131385Z bogomips : 5999.99 2025-05-07T19:43:00.9131599Z clflush size : 64 2025-05-07T19:43:00.9131799Z cache_alignment : 64 2025-05-07T19:43:00.9132063Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.9132373Z power management: 2025-05-07T19:43:00.9132511Z 2025-05-07T19:43:00.9132594Z processor : 26 2025-05-07T19:43:00.9132814Z vendor_id : GenuineIntel 2025-05-07T19:43:00.9133043Z cpu family : 6 2025-05-07T19:43:00.9133253Z model : 85 2025-05-07T19:43:00.9133519Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.9133859Z stepping : 7 2025-05-07T19:43:00.9134057Z microcode : 0x5003901 2025-05-07T19:43:00.9134288Z cpu MHz : 1817.147 2025-05-07T19:43:00.9134502Z cache size : 36608 KB 2025-05-07T19:43:00.9134716Z physical id : 1 2025-05-07T19:43:00.9134915Z siblings : 48 2025-05-07T19:43:00.9135123Z core id : 2 2025-05-07T19:43:00.9135314Z cpu cores : 24 2025-05-07T19:43:00.9135509Z apicid : 68 2025-05-07T19:43:00.9135719Z initial apicid : 68 2025-05-07T19:43:00.9135922Z fpu : yes 2025-05-07T19:43:00.9136123Z fpu_exception : yes 2025-05-07T19:43:00.9136316Z cpuid level : 13 2025-05-07T19:43:00.9136516Z wp : yes 2025-05-07T19:43:00.9138593Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.9141060Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.9141600Z bogomips : 5999.99 2025-05-07T19:43:00.9141806Z clflush size : 64 2025-05-07T19:43:00.9142034Z cache_alignment : 64 2025-05-07T19:43:00.9142280Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.9142593Z power management: 2025-05-07T19:43:00.9142718Z 2025-05-07T19:43:00.9142813Z processor : 27 2025-05-07T19:43:00.9143021Z vendor_id : GenuineIntel 2025-05-07T19:43:00.9143341Z cpu family : 6 2025-05-07T19:43:00.9143540Z model : 85 2025-05-07T19:43:00.9143993Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.9144343Z stepping : 7 2025-05-07T19:43:00.9144572Z microcode : 0x5003901 2025-05-07T19:43:00.9144808Z cpu MHz : 1424.497 2025-05-07T19:43:00.9145040Z cache size : 36608 KB 2025-05-07T19:43:00.9145270Z physical id : 1 2025-05-07T19:43:00.9145495Z siblings : 48 2025-05-07T19:43:00.9145702Z core id : 3 2025-05-07T19:43:00.9145920Z cpu cores : 24 2025-05-07T19:43:00.9146130Z apicid : 70 2025-05-07T19:43:00.9146330Z initial apicid : 70 2025-05-07T19:43:00.9146559Z fpu : yes 2025-05-07T19:43:00.9146759Z fpu_exception : yes 2025-05-07T19:43:00.9146983Z cpuid level : 13 2025-05-07T19:43:00.9147191Z wp : yes 2025-05-07T19:43:00.9149528Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.9152151Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.9152730Z bogomips : 5999.99 2025-05-07T19:43:00.9152955Z clflush size : 64 2025-05-07T19:43:00.9153177Z cache_alignment : 64 2025-05-07T19:43:00.9153467Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.9153791Z power management: 2025-05-07T19:43:00.9153937Z 2025-05-07T19:43:00.9154017Z processor : 28 2025-05-07T19:43:00.9154241Z vendor_id : GenuineIntel 2025-05-07T19:43:00.9154488Z cpu family : 6 2025-05-07T19:43:00.9154704Z model : 85 2025-05-07T19:43:00.9154982Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.9155342Z stepping : 7 2025-05-07T19:43:00.9155557Z microcode : 0x5003901 2025-05-07T19:43:00.9155795Z cpu MHz : 2999.996 2025-05-07T19:43:00.9156122Z cache size : 36608 KB 2025-05-07T19:43:00.9156345Z physical id : 1 2025-05-07T19:43:00.9156534Z siblings : 48 2025-05-07T19:43:00.9156729Z core id : 4 2025-05-07T19:43:00.9156918Z cpu cores : 24 2025-05-07T19:43:00.9157128Z apicid : 72 2025-05-07T19:43:00.9157334Z initial apicid : 72 2025-05-07T19:43:00.9157529Z fpu : yes 2025-05-07T19:43:00.9157725Z fpu_exception : yes 2025-05-07T19:43:00.9157928Z cpuid level : 13 2025-05-07T19:43:00.9158137Z wp : yes 2025-05-07T19:43:00.9160219Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.9162622Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.9163219Z bogomips : 5999.99 2025-05-07T19:43:00.9163412Z clflush size : 64 2025-05-07T19:43:00.9163609Z cache_alignment : 64 2025-05-07T19:43:00.9163849Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.9164146Z power management: 2025-05-07T19:43:00.9164267Z 2025-05-07T19:43:00.9164353Z processor : 29 2025-05-07T19:43:00.9164541Z vendor_id : GenuineIntel 2025-05-07T19:43:00.9164756Z cpu family : 6 2025-05-07T19:43:00.9164939Z model : 85 2025-05-07T19:43:00.9165194Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.9165512Z stepping : 7 2025-05-07T19:43:00.9165701Z microcode : 0x5003901 2025-05-07T19:43:00.9165906Z cpu MHz : 1456.902 2025-05-07T19:43:00.9166108Z cache size : 36608 KB 2025-05-07T19:43:00.9166312Z physical id : 1 2025-05-07T19:43:00.9166505Z siblings : 48 2025-05-07T19:43:00.9166685Z core id : 5 2025-05-07T19:43:00.9166877Z cpu cores : 24 2025-05-07T19:43:00.9167072Z apicid : 74 2025-05-07T19:43:00.9167272Z initial apicid : 74 2025-05-07T19:43:00.9167496Z fpu : yes 2025-05-07T19:43:00.9167690Z fpu_exception : yes 2025-05-07T19:43:00.9167920Z cpuid level : 13 2025-05-07T19:43:00.9168121Z wp : yes 2025-05-07T19:43:00.9170295Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.9172717Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.9173261Z bogomips : 5999.99 2025-05-07T19:43:00.9173495Z clflush size : 64 2025-05-07T19:43:00.9173705Z cache_alignment : 64 2025-05-07T19:43:00.9173984Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.9174287Z power management: 2025-05-07T19:43:00.9174426Z 2025-05-07T19:43:00.9174508Z processor : 30 2025-05-07T19:43:00.9174722Z vendor_id : GenuineIntel 2025-05-07T19:43:00.9174944Z cpu family : 6 2025-05-07T19:43:00.9175147Z model : 85 2025-05-07T19:43:00.9175402Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.9175740Z stepping : 7 2025-05-07T19:43:00.9175935Z microcode : 0x5003901 2025-05-07T19:43:00.9176162Z cpu MHz : 2999.996 2025-05-07T19:43:00.9176366Z cache size : 36608 KB 2025-05-07T19:43:00.9176589Z physical id : 1 2025-05-07T19:43:00.9176786Z siblings : 48 2025-05-07T19:43:00.9176986Z core id : 6 2025-05-07T19:43:00.9177190Z cpu cores : 24 2025-05-07T19:43:00.9177382Z apicid : 76 2025-05-07T19:43:00.9177592Z initial apicid : 76 2025-05-07T19:43:00.9177799Z fpu : yes 2025-05-07T19:43:00.9178008Z fpu_exception : yes 2025-05-07T19:43:00.9178208Z cpuid level : 13 2025-05-07T19:43:00.9178420Z wp : yes 2025-05-07T19:43:00.9180496Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.9182912Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.9183558Z bogomips : 5999.99 2025-05-07T19:43:00.9183947Z clflush size : 64 2025-05-07T19:43:00.9184254Z cache_alignment : 64 2025-05-07T19:43:00.9184580Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.9184925Z power management: 2025-05-07T19:43:00.9185060Z 2025-05-07T19:43:00.9185167Z processor : 31 2025-05-07T19:43:00.9185388Z vendor_id : GenuineIntel 2025-05-07T19:43:00.9185651Z cpu family : 6 2025-05-07T19:43:00.9185859Z model : 85 2025-05-07T19:43:00.9186161Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.9186510Z stepping : 7 2025-05-07T19:43:00.9186734Z microcode : 0x5003901 2025-05-07T19:43:00.9186961Z cpu MHz : 1627.225 2025-05-07T19:43:00.9187193Z cache size : 36608 KB 2025-05-07T19:43:00.9187418Z physical id : 1 2025-05-07T19:43:00.9187647Z siblings : 48 2025-05-07T19:43:00.9187850Z core id : 7 2025-05-07T19:43:00.9188073Z cpu cores : 24 2025-05-07T19:43:00.9188310Z apicid : 78 2025-05-07T19:43:00.9188530Z initial apicid : 78 2025-05-07T19:43:00.9188790Z fpu : yes 2025-05-07T19:43:00.9189004Z fpu_exception : yes 2025-05-07T19:43:00.9189270Z cpuid level : 13 2025-05-07T19:43:00.9189498Z wp : yes 2025-05-07T19:43:00.9191839Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.9194488Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.9195088Z bogomips : 5999.99 2025-05-07T19:43:00.9195351Z clflush size : 64 2025-05-07T19:43:00.9195590Z cache_alignment : 64 2025-05-07T19:43:00.9195908Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.9196348Z power management: 2025-05-07T19:43:00.9196510Z 2025-05-07T19:43:00.9196600Z processor : 32 2025-05-07T19:43:00.9196843Z vendor_id : GenuineIntel 2025-05-07T19:43:00.9197083Z cpu family : 6 2025-05-07T19:43:00.9197316Z model : 85 2025-05-07T19:43:00.9197588Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.9197952Z stepping : 7 2025-05-07T19:43:00.9198165Z microcode : 0x5003901 2025-05-07T19:43:00.9198417Z cpu MHz : 2999.996 2025-05-07T19:43:00.9198641Z cache size : 36608 KB 2025-05-07T19:43:00.9198906Z physical id : 1 2025-05-07T19:43:00.9199130Z siblings : 48 2025-05-07T19:43:00.9199372Z core id : 8 2025-05-07T19:43:00.9199612Z cpu cores : 24 2025-05-07T19:43:00.9199827Z apicid : 80 2025-05-07T19:43:00.9200064Z initial apicid : 80 2025-05-07T19:43:00.9200287Z fpu : yes 2025-05-07T19:43:00.9200518Z fpu_exception : yes 2025-05-07T19:43:00.9200743Z cpuid level : 13 2025-05-07T19:43:00.9200985Z wp : yes 2025-05-07T19:43:00.9203076Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.9205521Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.9206101Z bogomips : 5999.99 2025-05-07T19:43:00.9206301Z clflush size : 64 2025-05-07T19:43:00.9206507Z cache_alignment : 64 2025-05-07T19:43:00.9206755Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.9207129Z power management: 2025-05-07T19:43:00.9207251Z 2025-05-07T19:43:00.9207344Z processor : 33 2025-05-07T19:43:00.9207540Z vendor_id : GenuineIntel 2025-05-07T19:43:00.9207773Z cpu family : 6 2025-05-07T19:43:00.9207951Z model : 85 2025-05-07T19:43:00.9208213Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.9208524Z stepping : 7 2025-05-07T19:43:00.9208916Z microcode : 0x5003901 2025-05-07T19:43:00.9209300Z cpu MHz : 1930.865 2025-05-07T19:43:00.9209525Z cache size : 36608 KB 2025-05-07T19:43:00.9209809Z physical id : 1 2025-05-07T19:43:00.9210218Z siblings : 48 2025-05-07T19:43:00.9210414Z core id : 9 2025-05-07T19:43:00.9210619Z cpu cores : 24 2025-05-07T19:43:00.9210829Z apicid : 82 2025-05-07T19:43:00.9211028Z initial apicid : 82 2025-05-07T19:43:00.9211249Z fpu : yes 2025-05-07T19:43:00.9211440Z fpu_exception : yes 2025-05-07T19:43:00.9211660Z cpuid level : 13 2025-05-07T19:43:00.9211863Z wp : yes 2025-05-07T19:43:00.9214111Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.9216835Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.9217402Z bogomips : 5999.99 2025-05-07T19:43:00.9217620Z clflush size : 64 2025-05-07T19:43:00.9217826Z cache_alignment : 64 2025-05-07T19:43:00.9218097Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.9218819Z lspci: Unable to load libkmod resources: error -2 2025-05-07T19:43:00.9219132Z power management: 2025-05-07T19:43:00.9219259Z 2025-05-07T19:43:00.9219352Z processor : 34 2025-05-07T19:43:00.9219557Z vendor_id : GenuineIntel 2025-05-07T19:43:00.9219795Z cpu family : 6 2025-05-07T19:43:00.9219985Z model : 85 2025-05-07T19:43:00.9220258Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.9220596Z stepping : 7 2025-05-07T19:43:00.9220801Z microcode : 0x5003901 2025-05-07T19:43:00.9221015Z cpu MHz : 1657.680 2025-05-07T19:43:00.9221230Z cache size : 36608 KB 2025-05-07T19:43:00.9221564Z physical id : 1 2025-05-07T19:43:00.9221755Z siblings : 48 2025-05-07T19:43:00.9221955Z core id : 10 2025-05-07T19:43:00.9222142Z cpu cores : 24 2025-05-07T19:43:00.9222342Z apicid : 84 2025-05-07T19:43:00.9222528Z initial apicid : 84 2025-05-07T19:43:00.9222744Z fpu : yes 2025-05-07T19:43:00.9222924Z fpu_exception : yes 2025-05-07T19:43:00.9223018Z cpuid level : 13 2025-05-07T19:43:00.9223097Z wp : yes 2025-05-07T19:43:00.9225453Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.9225860Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.9225946Z bogomips : 5999.99 2025-05-07T19:43:00.9226026Z clflush size : 64 2025-05-07T19:43:00.9226121Z cache_alignment : 64 2025-05-07T19:43:00.9226253Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.9226425Z power management: 2025-05-07T19:43:00.9226430Z 2025-05-07T19:43:00.9226511Z processor : 35 2025-05-07T19:43:00.9226615Z vendor_id : GenuineIntel 2025-05-07T19:43:00.9226698Z cpu family : 6 2025-05-07T19:43:00.9226775Z model : 85 2025-05-07T19:43:00.9226948Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.9227032Z stepping : 7 2025-05-07T19:43:00.9227116Z microcode : 0x5003901 2025-05-07T19:43:00.9227197Z cpu MHz : 1515.798 2025-05-07T19:43:00.9227290Z cache size : 36608 KB 2025-05-07T19:43:00.9227372Z physical id : 1 2025-05-07T19:43:00.9227448Z siblings : 48 2025-05-07T19:43:00.9227539Z core id : 11 2025-05-07T19:43:00.9227622Z cpu cores : 24 2025-05-07T19:43:00.9227706Z apicid : 86 2025-05-07T19:43:00.9227789Z initial apicid : 86 2025-05-07T19:43:00.9227877Z fpu : yes 2025-05-07T19:43:00.9227962Z fpu_exception : yes 2025-05-07T19:43:00.9228044Z cpuid level : 13 2025-05-07T19:43:00.9228128Z wp : yes 2025-05-07T19:43:00.9230253Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.9230716Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.9230812Z bogomips : 5999.99 2025-05-07T19:43:00.9230895Z clflush size : 64 2025-05-07T19:43:00.9230979Z cache_alignment : 64 2025-05-07T19:43:00.9231118Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.9231205Z power management: 2025-05-07T19:43:00.9231214Z 2025-05-07T19:43:00.9231297Z processor : 36 2025-05-07T19:43:00.9231386Z vendor_id : GenuineIntel 2025-05-07T19:43:00.9231480Z cpu family : 6 2025-05-07T19:43:00.9231560Z model : 85 2025-05-07T19:43:00.9231718Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.9231810Z stepping : 7 2025-05-07T19:43:00.9231897Z microcode : 0x5003901 2025-05-07T19:43:00.9231980Z cpu MHz : 1551.356 2025-05-07T19:43:00.9232060Z cache size : 36608 KB 2025-05-07T19:43:00.9232152Z physical id : 1 2025-05-07T19:43:00.9232234Z siblings : 48 2025-05-07T19:43:00.9232312Z core id : 12 2025-05-07T19:43:00.9232405Z cpu cores : 24 2025-05-07T19:43:00.9232484Z apicid : 88 2025-05-07T19:43:00.9232573Z initial apicid : 88 2025-05-07T19:43:00.9232652Z fpu : yes 2025-05-07T19:43:00.9232743Z fpu_exception : yes 2025-05-07T19:43:00.9232826Z cpuid level : 13 2025-05-07T19:43:00.9232904Z wp : yes 2025-05-07T19:43:00.9235040Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.9235422Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.9235512Z bogomips : 5999.99 2025-05-07T19:43:00.9235603Z clflush size : 64 2025-05-07T19:43:00.9235685Z cache_alignment : 64 2025-05-07T19:43:00.9235818Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.9236030Z power management: 2025-05-07T19:43:00.9236035Z 2025-05-07T19:43:00.9236162Z processor : 37 2025-05-07T19:43:00.9236244Z vendor_id : GenuineIntel 2025-05-07T19:43:00.9236319Z cpu family : 6 2025-05-07T19:43:00.9236406Z model : 85 2025-05-07T19:43:00.9236549Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.9236624Z stepping : 7 2025-05-07T19:43:00.9236718Z microcode : 0x5003901 2025-05-07T19:43:00.9236790Z cpu MHz : 1475.102 2025-05-07T19:43:00.9236867Z cache size : 36608 KB 2025-05-07T19:43:00.9236942Z physical id : 1 2025-05-07T19:43:00.9237024Z siblings : 48 2025-05-07T19:43:00.9237095Z core id : 13 2025-05-07T19:43:00.9237171Z cpu cores : 24 2025-05-07T19:43:00.9237243Z apicid : 90 2025-05-07T19:43:00.9237334Z initial apicid : 90 2025-05-07T19:43:00.9237406Z fpu : yes 2025-05-07T19:43:00.9237485Z fpu_exception : yes 2025-05-07T19:43:00.9237572Z cpuid level : 13 2025-05-07T19:43:00.9237643Z wp : yes 2025-05-07T19:43:00.9239606Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.9239978Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.9240051Z bogomips : 5999.99 2025-05-07T19:43:00.9240176Z clflush size : 64 2025-05-07T19:43:00.9240269Z cache_alignment : 64 2025-05-07T19:43:00.9240387Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.9240465Z power management: 2025-05-07T19:43:00.9240469Z 2025-05-07T19:43:00.9240556Z processor : 38 2025-05-07T19:43:00.9240652Z vendor_id : GenuineIntel 2025-05-07T19:43:00.9240731Z cpu family : 6 2025-05-07T19:43:00.9240808Z model : 85 2025-05-07T19:43:00.9240980Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.9241057Z stepping : 7 2025-05-07T19:43:00.9241135Z microcode : 0x5003901 2025-05-07T19:43:00.9241229Z cpu MHz : 2999.996 2025-05-07T19:43:00.9241317Z cache size : 36608 KB 2025-05-07T19:43:00.9241394Z physical id : 1 2025-05-07T19:43:00.9241470Z siblings : 48 2025-05-07T19:43:00.9241571Z core id : 14 2025-05-07T19:43:00.9241656Z cpu cores : 24 2025-05-07T19:43:00.9241736Z apicid : 92 2025-05-07T19:43:00.9241815Z initial apicid : 92 2025-05-07T19:43:00.9241910Z fpu : yes 2025-05-07T19:43:00.9242003Z fpu_exception : yes 2025-05-07T19:43:00.9242081Z cpuid level : 13 2025-05-07T19:43:00.9242178Z wp : yes 2025-05-07T19:43:00.9244150Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.9244515Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.9244610Z bogomips : 5999.99 2025-05-07T19:43:00.9244697Z clflush size : 64 2025-05-07T19:43:00.9244787Z cache_alignment : 64 2025-05-07T19:43:00.9244921Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.9245008Z power management: 2025-05-07T19:43:00.9245012Z 2025-05-07T19:43:00.9245099Z processor : 39 2025-05-07T19:43:00.9245204Z vendor_id : GenuineIntel 2025-05-07T19:43:00.9245332Z cpu family : 6 2025-05-07T19:43:00.9245408Z model : 85 2025-05-07T19:43:00.9245558Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.9245652Z stepping : 7 2025-05-07T19:43:00.9245736Z microcode : 0x5003901 2025-05-07T19:43:00.9245821Z cpu MHz : 2999.996 2025-05-07T19:43:00.9245903Z cache size : 36608 KB 2025-05-07T19:43:00.9245999Z physical id : 1 2025-05-07T19:43:00.9246078Z siblings : 48 2025-05-07T19:43:00.9246162Z core id : 15 2025-05-07T19:43:00.9246257Z cpu cores : 24 2025-05-07T19:43:00.9246338Z apicid : 94 2025-05-07T19:43:00.9246428Z initial apicid : 94 2025-05-07T19:43:00.9246505Z fpu : yes 2025-05-07T19:43:00.9246602Z fpu_exception : yes 2025-05-07T19:43:00.9246685Z cpuid level : 13 2025-05-07T19:43:00.9246765Z wp : yes 2025-05-07T19:43:00.9248741Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.9249098Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.9249184Z bogomips : 5999.99 2025-05-07T19:43:00.9249278Z clflush size : 64 2025-05-07T19:43:00.9249356Z cache_alignment : 64 2025-05-07T19:43:00.9249531Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.9249626Z power management: 2025-05-07T19:43:00.9249630Z 2025-05-07T19:43:00.9249704Z processor : 40 2025-05-07T19:43:00.9249788Z vendor_id : GenuineIntel 2025-05-07T19:43:00.9249859Z cpu family : 6 2025-05-07T19:43:00.9249942Z model : 85 2025-05-07T19:43:00.9250089Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.9250167Z stepping : 7 2025-05-07T19:43:00.9250256Z microcode : 0x5003901 2025-05-07T19:43:00.9250331Z cpu MHz : 2999.996 2025-05-07T19:43:00.9250409Z cache size : 36608 KB 2025-05-07T19:43:00.9250486Z physical id : 1 2025-05-07T19:43:00.9250572Z siblings : 48 2025-05-07T19:43:00.9250641Z core id : 16 2025-05-07T19:43:00.9250716Z cpu cores : 24 2025-05-07T19:43:00.9250805Z apicid : 96 2025-05-07T19:43:00.9250879Z initial apicid : 96 2025-05-07T19:43:00.9250948Z fpu : yes 2025-05-07T19:43:00.9251029Z fpu_exception : yes 2025-05-07T19:43:00.9251118Z cpuid level : 13 2025-05-07T19:43:00.9251186Z wp : yes 2025-05-07T19:43:00.9253145Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.9253516Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.9253594Z bogomips : 5999.99 2025-05-07T19:43:00.9253674Z clflush size : 64 2025-05-07T19:43:00.9253773Z cache_alignment : 64 2025-05-07T19:43:00.9253892Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.9254054Z power management: 2025-05-07T19:43:00.9254058Z 2025-05-07T19:43:00.9254149Z processor : 41 2025-05-07T19:43:00.9254242Z vendor_id : GenuineIntel 2025-05-07T19:43:00.9254322Z cpu family : 6 2025-05-07T19:43:00.9254397Z model : 85 2025-05-07T19:43:00.9254569Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.9254708Z stepping : 7 2025-05-07T19:43:00.9254788Z microcode : 0x5003901 2025-05-07T19:43:00.9254889Z cpu MHz : 2999.996 2025-05-07T19:43:00.9254978Z cache size : 36608 KB 2025-05-07T19:43:00.9255063Z physical id : 1 2025-05-07T19:43:00.9255138Z siblings : 48 2025-05-07T19:43:00.9255225Z core id : 17 2025-05-07T19:43:00.9255305Z cpu cores : 24 2025-05-07T19:43:00.9255386Z apicid : 98 2025-05-07T19:43:00.9255480Z initial apicid : 98 2025-05-07T19:43:00.9255551Z fpu : yes 2025-05-07T19:43:00.9255631Z fpu_exception : yes 2025-05-07T19:43:00.9255708Z cpuid level : 13 2025-05-07T19:43:00.9255793Z wp : yes 2025-05-07T19:43:00.9257760Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.9258129Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.9258208Z bogomips : 5999.99 2025-05-07T19:43:00.9258295Z clflush size : 64 2025-05-07T19:43:00.9258378Z cache_alignment : 64 2025-05-07T19:43:00.9258518Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.9258602Z power management: 2025-05-07T19:43:00.9258655Z 2025-05-07T19:43:00.9258732Z processor : 42 2025-05-07T19:43:00.9258823Z vendor_id : GenuineIntel 2025-05-07T19:43:00.9258897Z cpu family : 6 2025-05-07T19:43:00.9258972Z model : 85 2025-05-07T19:43:00.9259116Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.9259202Z stepping : 7 2025-05-07T19:43:00.9259279Z microcode : 0x5003901 2025-05-07T19:43:00.9259355Z cpu MHz : 1819.118 2025-05-07T19:43:00.9259440Z cache size : 36608 KB 2025-05-07T19:43:00.9259515Z physical id : 1 2025-05-07T19:43:00.9259591Z siblings : 48 2025-05-07T19:43:00.9259665Z core id : 18 2025-05-07T19:43:00.9259748Z cpu cores : 24 2025-05-07T19:43:00.9259823Z apicid : 100 2025-05-07T19:43:00.9259902Z initial apicid : 100 2025-05-07T19:43:00.9259983Z fpu : yes 2025-05-07T19:43:00.9260061Z fpu_exception : yes 2025-05-07T19:43:00.9260139Z cpuid level : 13 2025-05-07T19:43:00.9260213Z wp : yes 2025-05-07T19:43:00.9262193Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.9262550Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.9262640Z bogomips : 5999.99 2025-05-07T19:43:00.9262715Z clflush size : 64 2025-05-07T19:43:00.9262792Z cache_alignment : 64 2025-05-07T19:43:00.9262911Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.9263002Z power management: 2025-05-07T19:43:00.9263006Z 2025-05-07T19:43:00.9263083Z processor : 43 2025-05-07T19:43:00.9263166Z vendor_id : GenuineIntel 2025-05-07T19:43:00.9263321Z cpu family : 6 2025-05-07T19:43:00.9263405Z model : 85 2025-05-07T19:43:00.9263554Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.9263637Z stepping : 7 2025-05-07T19:43:00.9263995Z microcode : 0x5003901 2025-05-07T19:43:00.9264077Z cpu MHz : 2999.996 2025-05-07T19:43:00.9264164Z cache size : 36608 KB 2025-05-07T19:43:00.9264263Z physical id : 1 2025-05-07T19:43:00.9264424Z siblings : 48 2025-05-07T19:43:00.9264502Z core id : 19 2025-05-07T19:43:00.9264581Z cpu cores : 24 2025-05-07T19:43:00.9264688Z apicid : 102 2025-05-07T19:43:00.9264785Z initial apicid : 102 2025-05-07T19:43:00.9264868Z fpu : yes 2025-05-07T19:43:00.9264956Z fpu_exception : yes 2025-05-07T19:43:00.9265060Z cpuid level : 13 2025-05-07T19:43:00.9265145Z wp : yes 2025-05-07T19:43:00.9267278Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.9267687Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.9267772Z bogomips : 5999.99 2025-05-07T19:43:00.9267875Z clflush size : 64 2025-05-07T19:43:00.9267965Z cache_alignment : 64 2025-05-07T19:43:00.9268099Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.9268184Z power management: 2025-05-07T19:43:00.9268189Z 2025-05-07T19:43:00.9268296Z processor : 44 2025-05-07T19:43:00.9268442Z vendor_id : GenuineIntel 2025-05-07T19:43:00.9268527Z cpu family : 6 2025-05-07T19:43:00.9268625Z model : 85 2025-05-07T19:43:00.9268790Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.9268872Z stepping : 7 2025-05-07T19:43:00.9268958Z microcode : 0x5003901 2025-05-07T19:43:00.9269072Z cpu MHz : 2999.996 2025-05-07T19:43:00.9269163Z cache size : 36608 KB 2025-05-07T19:43:00.9269243Z physical id : 1 2025-05-07T19:43:00.9269343Z siblings : 48 2025-05-07T19:43:00.9269431Z core id : 20 2025-05-07T19:43:00.9269519Z cpu cores : 24 2025-05-07T19:43:00.9269600Z apicid : 104 2025-05-07T19:43:00.9269707Z initial apicid : 104 2025-05-07T19:43:00.9269797Z fpu : yes 2025-05-07T19:43:00.9269887Z fpu_exception : yes 2025-05-07T19:43:00.9269970Z cpuid level : 13 2025-05-07T19:43:00.9270065Z wp : yes 2025-05-07T19:43:00.9272194Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.9272598Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.9272684Z bogomips : 5999.99 2025-05-07T19:43:00.9272774Z clflush size : 64 2025-05-07T19:43:00.9272868Z cache_alignment : 64 2025-05-07T19:43:00.9273014Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.9273102Z power management: 2025-05-07T19:43:00.9273107Z 2025-05-07T19:43:00.9273189Z processor : 45 2025-05-07T19:43:00.9273296Z vendor_id : GenuineIntel 2025-05-07T19:43:00.9273384Z cpu family : 6 2025-05-07T19:43:00.9273469Z model : 85 2025-05-07T19:43:00.9273653Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.9273733Z stepping : 7 2025-05-07T19:43:00.9273826Z microcode : 0x5003901 2025-05-07T19:43:00.9273908Z cpu MHz : 2999.996 2025-05-07T19:43:00.9274059Z cache size : 36608 KB 2025-05-07T19:43:00.9274146Z physical id : 1 2025-05-07T19:43:00.9274234Z siblings : 48 2025-05-07T19:43:00.9274326Z core id : 21 2025-05-07T19:43:00.9274423Z cpu cores : 24 2025-05-07T19:43:00.9274508Z apicid : 106 2025-05-07T19:43:00.9274595Z initial apicid : 106 2025-05-07T19:43:00.9274702Z fpu : yes 2025-05-07T19:43:00.9274786Z fpu_exception : yes 2025-05-07T19:43:00.9274869Z cpuid level : 13 2025-05-07T19:43:00.9274954Z wp : yes 2025-05-07T19:43:00.9277126Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.9277489Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.9277584Z bogomips : 5999.99 2025-05-07T19:43:00.9277665Z clflush size : 64 2025-05-07T19:43:00.9277745Z cache_alignment : 64 2025-05-07T19:43:00.9277870Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.9277969Z power management: 2025-05-07T19:43:00.9277974Z 2025-05-07T19:43:00.9278050Z processor : 46 2025-05-07T19:43:00.9278136Z vendor_id : GenuineIntel 2025-05-07T19:43:00.9278219Z cpu family : 6 2025-05-07T19:43:00.9278295Z model : 85 2025-05-07T19:43:00.9278496Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.9278582Z stepping : 7 2025-05-07T19:43:00.9278662Z microcode : 0x5003901 2025-05-07T19:43:00.9278743Z cpu MHz : 1493.110 2025-05-07T19:43:00.9278824Z cache size : 36608 KB 2025-05-07T19:43:00.9278917Z physical id : 1 2025-05-07T19:43:00.9278991Z siblings : 48 2025-05-07T19:43:00.9279063Z core id : 22 2025-05-07T19:43:00.9279139Z cpu cores : 24 2025-05-07T19:43:00.9279228Z apicid : 108 2025-05-07T19:43:00.9279309Z initial apicid : 108 2025-05-07T19:43:00.9279382Z fpu : yes 2025-05-07T19:43:00.9279476Z fpu_exception : yes 2025-05-07T19:43:00.9279552Z cpuid level : 13 2025-05-07T19:43:00.9279621Z wp : yes 2025-05-07T19:43:00.9281601Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.9281961Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.9282043Z bogomips : 5999.99 2025-05-07T19:43:00.9282124Z clflush size : 64 2025-05-07T19:43:00.9282205Z cache_alignment : 64 2025-05-07T19:43:00.9282326Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.9282403Z power management: 2025-05-07T19:43:00.9282414Z 2025-05-07T19:43:00.9282492Z processor : 47 2025-05-07T19:43:00.9282576Z vendor_id : GenuineIntel 2025-05-07T19:43:00.9282647Z cpu family : 6 2025-05-07T19:43:00.9282720Z model : 85 2025-05-07T19:43:00.9282870Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.9282943Z stepping : 7 2025-05-07T19:43:00.9283020Z microcode : 0x5003901 2025-05-07T19:43:00.9283096Z cpu MHz : 2999.996 2025-05-07T19:43:00.9283176Z cache size : 36608 KB 2025-05-07T19:43:00.9283250Z physical id : 1 2025-05-07T19:43:00.9283384Z siblings : 48 2025-05-07T19:43:00.9283454Z core id : 23 2025-05-07T19:43:00.9283525Z cpu cores : 24 2025-05-07T19:43:00.9283600Z apicid : 110 2025-05-07T19:43:00.9283688Z initial apicid : 110 2025-05-07T19:43:00.9283764Z fpu : yes 2025-05-07T19:43:00.9283841Z fpu_exception : yes 2025-05-07T19:43:00.9283940Z cpuid level : 13 2025-05-07T19:43:00.9284012Z wp : yes 2025-05-07T19:43:00.9285980Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.9286352Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.9286431Z bogomips : 5999.99 2025-05-07T19:43:00.9286505Z clflush size : 64 2025-05-07T19:43:00.9286590Z cache_alignment : 64 2025-05-07T19:43:00.9286712Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.9286791Z power management: 2025-05-07T19:43:00.9286795Z 2025-05-07T19:43:00.9286867Z processor : 48 2025-05-07T19:43:00.9286955Z vendor_id : GenuineIntel 2025-05-07T19:43:00.9287027Z cpu family : 6 2025-05-07T19:43:00.9287095Z model : 85 2025-05-07T19:43:00.9287248Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.9287375Z stepping : 7 2025-05-07T19:43:00.9287452Z microcode : 0x5003901 2025-05-07T19:43:00.9287528Z cpu MHz : 2999.996 2025-05-07T19:43:00.9287607Z cache size : 36608 KB 2025-05-07T19:43:00.9287680Z physical id : 0 2025-05-07T19:43:00.9287751Z siblings : 48 2025-05-07T19:43:00.9287830Z core id : 0 2025-05-07T19:43:00.9287904Z cpu cores : 24 2025-05-07T19:43:00.9287974Z apicid : 1 2025-05-07T19:43:00.9288051Z initial apicid : 1 2025-05-07T19:43:00.9288126Z fpu : yes 2025-05-07T19:43:00.9288205Z fpu_exception : yes 2025-05-07T19:43:00.9288276Z cpuid level : 13 2025-05-07T19:43:00.9288350Z wp : yes 2025-05-07T19:43:00.9290314Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.9290662Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.9290742Z bogomips : 5999.99 2025-05-07T19:43:00.9290813Z clflush size : 64 2025-05-07T19:43:00.9290894Z cache_alignment : 64 2025-05-07T19:43:00.9291017Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.9291095Z power management: 2025-05-07T19:43:00.9291099Z 2025-05-07T19:43:00.9291170Z processor : 49 2025-05-07T19:43:00.9291249Z vendor_id : GenuineIntel 2025-05-07T19:43:00.9291328Z cpu family : 6 2025-05-07T19:43:00.9291401Z model : 85 2025-05-07T19:43:00.9291545Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.9291631Z stepping : 7 2025-05-07T19:43:00.9291709Z microcode : 0x5003901 2025-05-07T19:43:00.9291781Z cpu MHz : 3192.988 2025-05-07T19:43:00.9291861Z cache size : 36608 KB 2025-05-07T19:43:00.9291946Z physical id : 0 2025-05-07T19:43:00.9292019Z siblings : 48 2025-05-07T19:43:00.9292091Z core id : 1 2025-05-07T19:43:00.9292176Z cpu cores : 24 2025-05-07T19:43:00.9292298Z apicid : 3 2025-05-07T19:43:00.9292387Z initial apicid : 3 2025-05-07T19:43:00.9292460Z fpu : yes 2025-05-07T19:43:00.9292552Z fpu_exception : yes 2025-05-07T19:43:00.9292630Z cpuid level : 13 2025-05-07T19:43:00.9292700Z wp : yes 2025-05-07T19:43:00.9294687Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.9295045Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.9295125Z bogomips : 5999.99 2025-05-07T19:43:00.9295210Z clflush size : 64 2025-05-07T19:43:00.9295291Z cache_alignment : 64 2025-05-07T19:43:00.9295414Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.9295502Z power management: 2025-05-07T19:43:00.9295506Z 2025-05-07T19:43:00.9295583Z processor : 50 2025-05-07T19:43:00.9295669Z vendor_id : GenuineIntel 2025-05-07T19:43:00.9295749Z cpu family : 6 2025-05-07T19:43:00.9295830Z model : 85 2025-05-07T19:43:00.9295984Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.9296061Z stepping : 7 2025-05-07T19:43:00.9296147Z microcode : 0x5003901 2025-05-07T19:43:00.9296709Z cpu MHz : 3165.965 2025-05-07T19:43:00.9296792Z cache size : 36608 KB 2025-05-07T19:43:00.9296873Z physical id : 0 2025-05-07T19:43:00.9296959Z siblings : 48 2025-05-07T19:43:00.9297034Z core id : 2 2025-05-07T19:43:00.9297112Z cpu cores : 24 2025-05-07T19:43:00.9297203Z apicid : 5 2025-05-07T19:43:00.9297287Z initial apicid : 5 2025-05-07T19:43:00.9297368Z fpu : yes 2025-05-07T19:43:00.9297450Z fpu_exception : yes 2025-05-07T19:43:00.9297539Z cpuid level : 13 2025-05-07T19:43:00.9297613Z wp : yes 2025-05-07T19:43:00.9299581Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.9299942Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.9300025Z bogomips : 5999.99 2025-05-07T19:43:00.9300103Z clflush size : 64 2025-05-07T19:43:00.9300194Z cache_alignment : 64 2025-05-07T19:43:00.9300316Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.9300397Z power management: 2025-05-07T19:43:00.9300401Z 2025-05-07T19:43:00.9300491Z processor : 51 2025-05-07T19:43:00.9300573Z vendor_id : GenuineIntel 2025-05-07T19:43:00.9300648Z cpu family : 6 2025-05-07T19:43:00.9300721Z model : 85 2025-05-07T19:43:00.9300882Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.9300956Z stepping : 7 2025-05-07T19:43:00.9301035Z microcode : 0x5003901 2025-05-07T19:43:00.9301123Z cpu MHz : 2999.996 2025-05-07T19:43:00.9301201Z cache size : 36608 KB 2025-05-07T19:43:00.9301278Z physical id : 0 2025-05-07T19:43:00.9301351Z siblings : 48 2025-05-07T19:43:00.9301436Z core id : 3 2025-05-07T19:43:00.9301510Z cpu cores : 24 2025-05-07T19:43:00.9301584Z apicid : 7 2025-05-07T19:43:00.9301663Z initial apicid : 7 2025-05-07T19:43:00.9301819Z fpu : yes 2025-05-07T19:43:00.9301898Z fpu_exception : yes 2025-05-07T19:43:00.9301974Z cpuid level : 13 2025-05-07T19:43:00.9302060Z wp : yes 2025-05-07T19:43:00.9304329Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.9304713Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.9304811Z bogomips : 5999.99 2025-05-07T19:43:00.9304896Z clflush size : 64 2025-05-07T19:43:00.9304984Z cache_alignment : 64 2025-05-07T19:43:00.9305125Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.9305212Z power management: 2025-05-07T19:43:00.9305216Z 2025-05-07T19:43:00.9305299Z processor : 52 2025-05-07T19:43:00.9305398Z vendor_id : GenuineIntel 2025-05-07T19:43:00.9305480Z cpu family : 6 2025-05-07T19:43:00.9305559Z model : 85 2025-05-07T19:43:00.9305718Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.9305813Z stepping : 7 2025-05-07T19:43:00.9305898Z microcode : 0x5003901 2025-05-07T19:43:00.9305976Z cpu MHz : 2999.996 2025-05-07T19:43:00.9306073Z cache size : 36608 KB 2025-05-07T19:43:00.9306215Z physical id : 0 2025-05-07T19:43:00.9306296Z siblings : 48 2025-05-07T19:43:00.9306374Z core id : 4 2025-05-07T19:43:00.9306469Z cpu cores : 24 2025-05-07T19:43:00.9306549Z apicid : 9 2025-05-07T19:43:00.9306632Z initial apicid : 9 2025-05-07T19:43:00.9306709Z fpu : yes 2025-05-07T19:43:00.9306806Z fpu_exception : yes 2025-05-07T19:43:00.9306893Z cpuid level : 13 2025-05-07T19:43:00.9306971Z wp : yes 2025-05-07T19:43:00.9309274Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.9309657Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.9309740Z bogomips : 5999.99 2025-05-07T19:43:00.9309834Z clflush size : 64 2025-05-07T19:43:00.9309922Z cache_alignment : 64 2025-05-07T19:43:00.9310054Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.9310327Z power management: 2025-05-07T19:43:00.9310333Z 2025-05-07T19:43:00.9310414Z processor : 53 2025-05-07T19:43:00.9310500Z vendor_id : GenuineIntel 2025-05-07T19:43:00.9310580Z cpu family : 6 2025-05-07T19:43:00.9310664Z model : 85 2025-05-07T19:43:00.9310817Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.9310899Z stepping : 7 2025-05-07T19:43:00.9310989Z microcode : 0x5003901 2025-05-07T19:43:00.9311069Z cpu MHz : 2999.996 2025-05-07T19:43:00.9311153Z cache size : 36608 KB 2025-05-07T19:43:00.9311236Z physical id : 0 2025-05-07T19:43:00.9311321Z siblings : 48 2025-05-07T19:43:00.9311404Z core id : 5 2025-05-07T19:43:00.9311486Z cpu cores : 24 2025-05-07T19:43:00.9311575Z apicid : 11 2025-05-07T19:43:00.9311656Z initial apicid : 11 2025-05-07T19:43:00.9311736Z fpu : yes 2025-05-07T19:43:00.9311823Z fpu_exception : yes 2025-05-07T19:43:00.9311915Z cpuid level : 13 2025-05-07T19:43:00.9312095Z wp : yes 2025-05-07T19:43:00.9314222Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.9314627Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.9314711Z bogomips : 5999.99 2025-05-07T19:43:00.9314795Z clflush size : 64 2025-05-07T19:43:00.9314900Z cache_alignment : 64 2025-05-07T19:43:00.9315032Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.9315122Z power management: 2025-05-07T19:43:00.9315126Z 2025-05-07T19:43:00.9315223Z processor : 54 2025-05-07T19:43:00.9315317Z vendor_id : GenuineIntel 2025-05-07T19:43:00.9315399Z cpu family : 6 2025-05-07T19:43:00.9315478Z model : 85 2025-05-07T19:43:00.9315650Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.9315733Z stepping : 7 2025-05-07T19:43:00.9315818Z microcode : 0x5003901 2025-05-07T19:43:00.9315915Z cpu MHz : 3255.310 2025-05-07T19:43:00.9315996Z cache size : 36608 KB 2025-05-07T19:43:00.9316081Z physical id : 0 2025-05-07T19:43:00.9316162Z siblings : 48 2025-05-07T19:43:00.9316252Z core id : 6 2025-05-07T19:43:00.9316397Z cpu cores : 24 2025-05-07T19:43:00.9316475Z apicid : 13 2025-05-07T19:43:00.9316572Z initial apicid : 13 2025-05-07T19:43:00.9316646Z fpu : yes 2025-05-07T19:43:00.9316729Z fpu_exception : yes 2025-05-07T19:43:00.9316813Z cpuid level : 13 2025-05-07T19:43:00.9316907Z wp : yes 2025-05-07T19:43:00.9319029Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.9319430Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.9319519Z bogomips : 5999.99 2025-05-07T19:43:00.9319602Z clflush size : 64 2025-05-07T19:43:00.9319687Z cache_alignment : 64 2025-05-07T19:43:00.9319831Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.9319924Z power management: 2025-05-07T19:43:00.9319929Z 2025-05-07T19:43:00.9320012Z processor : 55 2025-05-07T19:43:00.9320110Z vendor_id : GenuineIntel 2025-05-07T19:43:00.9320192Z cpu family : 6 2025-05-07T19:43:00.9320272Z model : 85 2025-05-07T19:43:00.9320429Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.9320523Z stepping : 7 2025-05-07T19:43:00.9320740Z microcode : 0x5003901 2025-05-07T19:43:00.9320818Z cpu MHz : 3173.896 2025-05-07T19:43:00.9320911Z cache size : 36608 KB 2025-05-07T19:43:00.9321106Z physical id : 0 2025-05-07T19:43:00.9321181Z siblings : 48 2025-05-07T19:43:00.9321249Z core id : 7 2025-05-07T19:43:00.9321333Z cpu cores : 24 2025-05-07T19:43:00.9321413Z apicid : 15 2025-05-07T19:43:00.9321491Z initial apicid : 15 2025-05-07T19:43:00.9321574Z fpu : yes 2025-05-07T19:43:00.9321651Z fpu_exception : yes 2025-05-07T19:43:00.9321730Z cpuid level : 13 2025-05-07T19:43:00.9321801Z wp : yes 2025-05-07T19:43:00.9323771Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.9324180Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.9324265Z bogomips : 5999.99 2025-05-07T19:43:00.9324339Z clflush size : 64 2025-05-07T19:43:00.9324421Z cache_alignment : 64 2025-05-07T19:43:00.9324540Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.9324626Z power management: 2025-05-07T19:43:00.9324633Z 2025-05-07T19:43:00.9324705Z processor : 56 2025-05-07T19:43:00.9324789Z vendor_id : GenuineIntel 2025-05-07T19:43:00.9324876Z cpu family : 6 2025-05-07T19:43:00.9324948Z model : 85 2025-05-07T19:43:00.9325093Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.9325172Z stepping : 7 2025-05-07T19:43:00.9325259Z microcode : 0x5003901 2025-05-07T19:43:00.9325332Z cpu MHz : 3195.604 2025-05-07T19:43:00.9325411Z cache size : 36608 KB 2025-05-07T19:43:00.9325493Z physical id : 0 2025-05-07T19:43:00.9325566Z siblings : 48 2025-05-07T19:43:00.9325640Z core id : 8 2025-05-07T19:43:00.9325714Z cpu cores : 24 2025-05-07T19:43:00.9325791Z apicid : 17 2025-05-07T19:43:00.9325919Z initial apicid : 17 2025-05-07T19:43:00.9325991Z fpu : yes 2025-05-07T19:43:00.9326070Z fpu_exception : yes 2025-05-07T19:43:00.9326153Z cpuid level : 13 2025-05-07T19:43:00.9326226Z wp : yes 2025-05-07T19:43:00.9328187Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.9328550Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.9328626Z bogomips : 5999.99 2025-05-07T19:43:00.9328714Z clflush size : 64 2025-05-07T19:43:00.9328791Z cache_alignment : 64 2025-05-07T19:43:00.9328904Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.9328979Z power management: 2025-05-07T19:43:00.9328983Z 2025-05-07T19:43:00.9329069Z processor : 57 2025-05-07T19:43:00.9329155Z vendor_id : GenuineIntel 2025-05-07T19:43:00.9329229Z cpu family : 6 2025-05-07T19:43:00.9329310Z model : 85 2025-05-07T19:43:00.9329457Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.9329530Z stepping : 7 2025-05-07T19:43:00.9329607Z microcode : 0x5003901 2025-05-07T19:43:00.9329692Z cpu MHz : 3174.003 2025-05-07T19:43:00.9329769Z cache size : 36608 KB 2025-05-07T19:43:00.9329844Z physical id : 0 2025-05-07T19:43:00.9329930Z siblings : 48 2025-05-07T19:43:00.9330006Z core id : 9 2025-05-07T19:43:00.9330085Z cpu cores : 24 2025-05-07T19:43:00.9330157Z apicid : 19 2025-05-07T19:43:00.9330246Z initial apicid : 19 2025-05-07T19:43:00.9330318Z fpu : yes 2025-05-07T19:43:00.9330400Z fpu_exception : yes 2025-05-07T19:43:00.9330474Z cpuid level : 13 2025-05-07T19:43:00.9330557Z wp : yes 2025-05-07T19:43:00.9332517Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.9332932Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.9333012Z bogomips : 5999.99 2025-05-07T19:43:00.9333091Z clflush size : 64 2025-05-07T19:43:00.9333176Z cache_alignment : 64 2025-05-07T19:43:00.9333304Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.9333385Z power management: 2025-05-07T19:43:00.9333389Z 2025-05-07T19:43:00.9333465Z processor : 58 2025-05-07T19:43:00.9333557Z vendor_id : GenuineIntel 2025-05-07T19:43:00.9333633Z cpu family : 6 2025-05-07T19:43:00.9333706Z model : 85 2025-05-07T19:43:00.9333856Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.9333932Z stepping : 7 2025-05-07T19:43:00.9334011Z microcode : 0x5003901 2025-05-07T19:43:00.9334086Z cpu MHz : 3230.537 2025-05-07T19:43:00.9334172Z cache size : 36608 KB 2025-05-07T19:43:00.9334248Z physical id : 0 2025-05-07T19:43:00.9334322Z siblings : 48 2025-05-07T19:43:00.9334393Z core id : 10 2025-05-07T19:43:00.9334476Z cpu cores : 24 2025-05-07T19:43:00.9334550Z apicid : 21 2025-05-07T19:43:00.9334630Z initial apicid : 21 2025-05-07T19:43:00.9334713Z fpu : yes 2025-05-07T19:43:00.9334793Z fpu_exception : yes 2025-05-07T19:43:00.9334933Z cpuid level : 13 2025-05-07T19:43:00.9335003Z wp : yes 2025-05-07T19:43:00.9336974Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.9337331Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.9337417Z bogomips : 5999.99 2025-05-07T19:43:00.9337498Z clflush size : 64 2025-05-07T19:43:00.9337579Z cache_alignment : 64 2025-05-07T19:43:00.9337706Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.9337793Z power management: 2025-05-07T19:43:00.9337797Z 2025-05-07T19:43:00.9337871Z processor : 59 2025-05-07T19:43:00.9337954Z vendor_id : GenuineIntel 2025-05-07T19:43:00.9338040Z cpu family : 6 2025-05-07T19:43:00.9338112Z model : 85 2025-05-07T19:43:00.9338255Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.9338332Z stepping : 7 2025-05-07T19:43:00.9338423Z microcode : 0x5003901 2025-05-07T19:43:00.9338501Z cpu MHz : 3235.060 2025-05-07T19:43:00.9338577Z cache size : 36608 KB 2025-05-07T19:43:00.9338665Z physical id : 0 2025-05-07T19:43:00.9338737Z siblings : 48 2025-05-07T19:43:00.9338811Z core id : 11 2025-05-07T19:43:00.9338882Z cpu cores : 24 2025-05-07T19:43:00.9338967Z apicid : 23 2025-05-07T19:43:00.9339043Z initial apicid : 23 2025-05-07T19:43:00.9339112Z fpu : yes 2025-05-07T19:43:00.9339203Z fpu_exception : yes 2025-05-07T19:43:00.9339276Z cpuid level : 13 2025-05-07T19:43:00.9339347Z wp : yes 2025-05-07T19:43:00.9341335Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.9341749Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.9341824Z bogomips : 5999.99 2025-05-07T19:43:00.9341913Z clflush size : 64 2025-05-07T19:43:00.9341995Z cache_alignment : 64 2025-05-07T19:43:00.9342122Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.9342200Z power management: 2025-05-07T19:43:00.9342204Z 2025-05-07T19:43:00.9342290Z processor : 60 2025-05-07T19:43:00.9342373Z vendor_id : GenuineIntel 2025-05-07T19:43:00.9342450Z cpu family : 6 2025-05-07T19:43:00.9342535Z model : 85 2025-05-07T19:43:00.9342683Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.9342757Z stepping : 7 2025-05-07T19:43:00.9342836Z microcode : 0x5003901 2025-05-07T19:43:00.9342927Z cpu MHz : 3186.076 2025-05-07T19:43:00.9343009Z cache size : 36608 KB 2025-05-07T19:43:00.9343083Z physical id : 0 2025-05-07T19:43:00.9343171Z siblings : 48 2025-05-07T19:43:00.9343310Z core id : 12 2025-05-07T19:43:00.9343412Z cpu cores : 24 2025-05-07T19:43:00.9343487Z apicid : 25 2025-05-07T19:43:00.9343581Z initial apicid : 25 2025-05-07T19:43:00.9343651Z fpu : yes 2025-05-07T19:43:00.9343899Z fpu_exception : yes 2025-05-07T19:43:00.9343993Z cpuid level : 13 2025-05-07T19:43:00.9344071Z wp : yes 2025-05-07T19:43:00.9346254Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.9346653Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.9346736Z bogomips : 5999.99 2025-05-07T19:43:00.9346821Z clflush size : 64 2025-05-07T19:43:00.9346913Z cache_alignment : 64 2025-05-07T19:43:00.9347043Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.9347133Z power management: 2025-05-07T19:43:00.9347138Z 2025-05-07T19:43:00.9347220Z processor : 61 2025-05-07T19:43:00.9347317Z vendor_id : GenuineIntel 2025-05-07T19:43:00.9347400Z cpu family : 6 2025-05-07T19:43:00.9347480Z model : 85 2025-05-07T19:43:00.9347655Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.9347744Z stepping : 7 2025-05-07T19:43:00.9347832Z microcode : 0x5003901 2025-05-07T19:43:00.9347916Z cpu MHz : 3153.773 2025-05-07T19:43:00.9348021Z cache size : 36608 KB 2025-05-07T19:43:00.9348110Z physical id : 0 2025-05-07T19:43:00.9348193Z siblings : 48 2025-05-07T19:43:00.9348293Z core id : 13 2025-05-07T19:43:00.9348378Z cpu cores : 24 2025-05-07T19:43:00.9348464Z apicid : 27 2025-05-07T19:43:00.9348555Z initial apicid : 27 2025-05-07T19:43:00.9348654Z fpu : yes 2025-05-07T19:43:00.9348745Z fpu_exception : yes 2025-05-07T19:43:00.9348832Z cpuid level : 13 2025-05-07T19:43:00.9348930Z wp : yes 2025-05-07T19:43:00.9351124Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.9351567Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.9351674Z bogomips : 5999.99 2025-05-07T19:43:00.9351760Z clflush size : 64 2025-05-07T19:43:00.9351852Z cache_alignment : 64 2025-05-07T19:43:00.9352007Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.9352097Z power management: 2025-05-07T19:43:00.9352101Z 2025-05-07T19:43:00.9352189Z processor : 62 2025-05-07T19:43:00.9352293Z vendor_id : GenuineIntel 2025-05-07T19:43:00.9352400Z cpu family : 6 2025-05-07T19:43:00.9352483Z model : 85 2025-05-07T19:43:00.9352646Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.9352753Z stepping : 7 2025-05-07T19:43:00.9352846Z microcode : 0x5003901 2025-05-07T19:43:00.9352934Z cpu MHz : 3452.881 2025-05-07T19:43:00.9353023Z cache size : 36608 KB 2025-05-07T19:43:00.9353128Z physical id : 0 2025-05-07T19:43:00.9353215Z siblings : 48 2025-05-07T19:43:00.9353300Z core id : 14 2025-05-07T19:43:00.9353405Z cpu cores : 24 2025-05-07T19:43:00.9353494Z apicid : 29 2025-05-07T19:43:00.9353585Z initial apicid : 29 2025-05-07T19:43:00.9353670Z fpu : yes 2025-05-07T19:43:00.9353779Z fpu_exception : yes 2025-05-07T19:43:00.9353866Z cpuid level : 13 2025-05-07T19:43:00.9353952Z wp : yes 2025-05-07T19:43:00.9356263Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.9356627Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.9356713Z bogomips : 5999.99 2025-05-07T19:43:00.9356814Z clflush size : 64 2025-05-07T19:43:00.9356900Z cache_alignment : 64 2025-05-07T19:43:00.9357028Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.9357132Z power management: 2025-05-07T19:43:00.9357137Z 2025-05-07T19:43:00.9357216Z processor : 63 2025-05-07T19:43:00.9357309Z vendor_id : GenuineIntel 2025-05-07T19:43:00.9357389Z cpu family : 6 2025-05-07T19:43:00.9357481Z model : 85 2025-05-07T19:43:00.9357630Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.9357709Z stepping : 7 2025-05-07T19:43:00.9357809Z microcode : 0x5003901 2025-05-07T19:43:00.9357890Z cpu MHz : 3183.835 2025-05-07T19:43:00.9357972Z cache size : 36608 KB 2025-05-07T19:43:00.9358052Z physical id : 0 2025-05-07T19:43:00.9358140Z siblings : 48 2025-05-07T19:43:00.9358218Z core id : 15 2025-05-07T19:43:00.9358295Z cpu cores : 24 2025-05-07T19:43:00.9358384Z apicid : 31 2025-05-07T19:43:00.9358466Z initial apicid : 31 2025-05-07T19:43:00.9358541Z fpu : yes 2025-05-07T19:43:00.9358625Z fpu_exception : yes 2025-05-07T19:43:00.9358719Z cpuid level : 13 2025-05-07T19:43:00.9358793Z wp : yes 2025-05-07T19:43:00.9360757Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.9361179Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.9361261Z bogomips : 5999.99 2025-05-07T19:43:00.9361342Z clflush size : 64 2025-05-07T19:43:00.9361440Z cache_alignment : 64 2025-05-07T19:43:00.9361562Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.9361645Z power management: 2025-05-07T19:43:00.9361650Z 2025-05-07T19:43:00.9361744Z processor : 64 2025-05-07T19:43:00.9361831Z vendor_id : GenuineIntel 2025-05-07T19:43:00.9361914Z cpu family : 6 2025-05-07T19:43:00.9361993Z model : 85 2025-05-07T19:43:00.9362161Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.9362240Z stepping : 7 2025-05-07T19:43:00.9362323Z microcode : 0x5003901 2025-05-07T19:43:00.9362420Z cpu MHz : 2999.996 2025-05-07T19:43:00.9362506Z cache size : 36608 KB 2025-05-07T19:43:00.9362592Z physical id : 0 2025-05-07T19:43:00.9362671Z siblings : 48 2025-05-07T19:43:00.9362760Z core id : 16 2025-05-07T19:43:00.9362840Z cpu cores : 24 2025-05-07T19:43:00.9362918Z apicid : 33 2025-05-07T19:43:00.9363000Z initial apicid : 33 2025-05-07T19:43:00.9363092Z fpu : yes 2025-05-07T19:43:00.9363177Z fpu_exception : yes 2025-05-07T19:43:00.9363256Z cpuid level : 13 2025-05-07T19:43:00.9363345Z wp : yes 2025-05-07T19:43:00.9365365Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.9365728Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.9365824Z bogomips : 5999.99 2025-05-07T19:43:00.9365906Z clflush size : 64 2025-05-07T19:43:00.9365989Z cache_alignment : 64 2025-05-07T19:43:00.9366127Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.9366211Z power management: 2025-05-07T19:43:00.9366215Z 2025-05-07T19:43:00.9366296Z processor : 65 2025-05-07T19:43:00.9366398Z vendor_id : GenuineIntel 2025-05-07T19:43:00.9366478Z cpu family : 6 2025-05-07T19:43:00.9366558Z model : 85 2025-05-07T19:43:00.9366708Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.9366802Z stepping : 7 2025-05-07T19:43:00.9366885Z microcode : 0x5003901 2025-05-07T19:43:00.9366966Z cpu MHz : 2999.996 2025-05-07T19:43:00.9367048Z cache size : 36608 KB 2025-05-07T19:43:00.9367149Z physical id : 0 2025-05-07T19:43:00.9367229Z siblings : 48 2025-05-07T19:43:00.9367307Z core id : 17 2025-05-07T19:43:00.9367399Z cpu cores : 24 2025-05-07T19:43:00.9367476Z apicid : 35 2025-05-07T19:43:00.9367560Z initial apicid : 35 2025-05-07T19:43:00.9367636Z fpu : yes 2025-05-07T19:43:00.9367734Z fpu_exception : yes 2025-05-07T19:43:00.9367813Z cpuid level : 13 2025-05-07T19:43:00.9367890Z wp : yes 2025-05-07T19:43:00.9369868Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.9370280Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.9370362Z bogomips : 5999.99 2025-05-07T19:43:00.9370456Z clflush size : 64 2025-05-07T19:43:00.9370541Z cache_alignment : 64 2025-05-07T19:43:00.9370666Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.9370762Z power management: 2025-05-07T19:43:00.9370766Z 2025-05-07T19:43:00.9370846Z processor : 66 2025-05-07T19:43:00.9370932Z vendor_id : GenuineIntel 2025-05-07T19:43:00.9371012Z cpu family : 6 2025-05-07T19:43:00.9371102Z model : 85 2025-05-07T19:43:00.9371253Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.9371332Z stepping : 7 2025-05-07T19:43:00.9371430Z microcode : 0x5003901 2025-05-07T19:43:00.9371510Z cpu MHz : 3766.605 2025-05-07T19:43:00.9371590Z cache size : 36608 KB 2025-05-07T19:43:00.9371670Z physical id : 0 2025-05-07T19:43:00.9371765Z siblings : 48 2025-05-07T19:43:00.9371842Z core id : 18 2025-05-07T19:43:00.9371921Z cpu cores : 24 2025-05-07T19:43:00.9372014Z apicid : 37 2025-05-07T19:43:00.9372096Z initial apicid : 37 2025-05-07T19:43:00.9372172Z fpu : yes 2025-05-07T19:43:00.9372256Z fpu_exception : yes 2025-05-07T19:43:00.9372350Z cpuid level : 13 2025-05-07T19:43:00.9372428Z wp : yes 2025-05-07T19:43:00.9374466Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.9374844Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.9374925Z bogomips : 5999.99 2025-05-07T19:43:00.9375006Z clflush size : 64 2025-05-07T19:43:00.9375102Z cache_alignment : 64 2025-05-07T19:43:00.9375226Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.9375308Z power management: 2025-05-07T19:43:00.9375312Z 2025-05-07T19:43:00.9375406Z processor : 67 2025-05-07T19:43:00.9375495Z vendor_id : GenuineIntel 2025-05-07T19:43:00.9375575Z cpu family : 6 2025-05-07T19:43:00.9375650Z model : 85 2025-05-07T19:43:00.9375815Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.9375896Z stepping : 7 2025-05-07T19:43:00.9375978Z microcode : 0x5003901 2025-05-07T19:43:00.9376071Z cpu MHz : 3244.574 2025-05-07T19:43:00.9376154Z cache size : 36608 KB 2025-05-07T19:43:00.9376234Z physical id : 0 2025-05-07T19:43:00.9376313Z siblings : 48 2025-05-07T19:43:00.9376410Z core id : 19 2025-05-07T19:43:00.9376487Z cpu cores : 24 2025-05-07T19:43:00.9376565Z apicid : 39 2025-05-07T19:43:00.9376661Z initial apicid : 39 2025-05-07T19:43:00.9376738Z fpu : yes 2025-05-07T19:43:00.9376821Z fpu_exception : yes 2025-05-07T19:43:00.9376901Z cpuid level : 13 2025-05-07T19:43:00.9376992Z wp : yes 2025-05-07T19:43:00.9378962Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.9379334Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.9379469Z bogomips : 5999.99 2025-05-07T19:43:00.9379550Z clflush size : 64 2025-05-07T19:43:00.9379635Z cache_alignment : 64 2025-05-07T19:43:00.9379776Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.9379862Z power management: 2025-05-07T19:43:00.9379867Z 2025-05-07T19:43:00.9379948Z processor : 68 2025-05-07T19:43:00.9380050Z vendor_id : GenuineIntel 2025-05-07T19:43:00.9380132Z cpu family : 6 2025-05-07T19:43:00.9380209Z model : 85 2025-05-07T19:43:00.9380360Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.9380453Z stepping : 7 2025-05-07T19:43:00.9380537Z microcode : 0x5003901 2025-05-07T19:43:00.9380618Z cpu MHz : 3203.165 2025-05-07T19:43:00.9380714Z cache size : 36608 KB 2025-05-07T19:43:00.9380794Z physical id : 0 2025-05-07T19:43:00.9380871Z siblings : 48 2025-05-07T19:43:00.9380948Z core id : 20 2025-05-07T19:43:00.9381043Z cpu cores : 24 2025-05-07T19:43:00.9381122Z apicid : 41 2025-05-07T19:43:00.9381204Z initial apicid : 41 2025-05-07T19:43:00.9381294Z fpu : yes 2025-05-07T19:43:00.9381378Z fpu_exception : yes 2025-05-07T19:43:00.9381458Z cpuid level : 13 2025-05-07T19:43:00.9381535Z wp : yes 2025-05-07T19:43:00.9383663Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.9384232Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.9384430Z bogomips : 5999.99 2025-05-07T19:43:00.9384518Z clflush size : 64 2025-05-07T19:43:00.9384611Z cache_alignment : 64 2025-05-07T19:43:00.9384746Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.9384852Z power management: 2025-05-07T19:43:00.9384857Z 2025-05-07T19:43:00.9384944Z processor : 69 2025-05-07T19:43:00.9385037Z vendor_id : GenuineIntel 2025-05-07T19:43:00.9385136Z cpu family : 6 2025-05-07T19:43:00.9385218Z model : 85 2025-05-07T19:43:00.9385379Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.9385465Z stepping : 7 2025-05-07T19:43:00.9385572Z microcode : 0x5003901 2025-05-07T19:43:00.9385663Z cpu MHz : 2999.996 2025-05-07T19:43:00.9385752Z cache size : 36608 KB 2025-05-07T19:43:00.9385857Z physical id : 0 2025-05-07T19:43:00.9385939Z siblings : 48 2025-05-07T19:43:00.9386023Z core id : 21 2025-05-07T19:43:00.9386109Z cpu cores : 24 2025-05-07T19:43:00.9386212Z apicid : 43 2025-05-07T19:43:00.9386308Z initial apicid : 43 2025-05-07T19:43:00.9386392Z fpu : yes 2025-05-07T19:43:00.9386501Z fpu_exception : yes 2025-05-07T19:43:00.9386589Z cpuid level : 13 2025-05-07T19:43:00.9386674Z wp : yes 2025-05-07T19:43:00.9388807Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.9389213Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.9389357Z bogomips : 5999.99 2025-05-07T19:43:00.9389466Z clflush size : 64 2025-05-07T19:43:00.9389559Z cache_alignment : 64 2025-05-07T19:43:00.9389693Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.9389782Z power management: 2025-05-07T19:43:00.9389786Z 2025-05-07T19:43:00.9389891Z processor : 70 2025-05-07T19:43:00.9389987Z vendor_id : GenuineIntel 2025-05-07T19:43:00.9390074Z cpu family : 6 2025-05-07T19:43:00.9390176Z model : 85 2025-05-07T19:43:00.9390338Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.9390425Z stepping : 7 2025-05-07T19:43:00.9390517Z microcode : 0x5003901 2025-05-07T19:43:00.9390622Z cpu MHz : 3401.382 2025-05-07T19:43:00.9390716Z cache size : 36608 KB 2025-05-07T19:43:00.9390803Z physical id : 0 2025-05-07T19:43:00.9390907Z siblings : 48 2025-05-07T19:43:00.9390992Z core id : 22 2025-05-07T19:43:00.9391079Z cpu cores : 24 2025-05-07T19:43:00.9391166Z apicid : 45 2025-05-07T19:43:00.9391275Z initial apicid : 45 2025-05-07T19:43:00.9391363Z fpu : yes 2025-05-07T19:43:00.9391453Z fpu_exception : yes 2025-05-07T19:43:00.9391539Z cpuid level : 13 2025-05-07T19:43:00.9391637Z wp : yes 2025-05-07T19:43:00.9393811Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.9394214Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.9394301Z bogomips : 5999.99 2025-05-07T19:43:00.9394391Z clflush size : 64 2025-05-07T19:43:00.9394481Z cache_alignment : 64 2025-05-07T19:43:00.9394627Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.9394716Z power management: 2025-05-07T19:43:00.9394720Z 2025-05-07T19:43:00.9394806Z processor : 71 2025-05-07T19:43:00.9394910Z vendor_id : GenuineIntel 2025-05-07T19:43:00.9394991Z cpu family : 6 2025-05-07T19:43:00.9395072Z model : 85 2025-05-07T19:43:00.9395247Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.9395329Z stepping : 7 2025-05-07T19:43:00.9395416Z microcode : 0x5003901 2025-05-07T19:43:00.9395501Z cpu MHz : 3220.413 2025-05-07T19:43:00.9395601Z cache size : 36608 KB 2025-05-07T19:43:00.9395690Z physical id : 0 2025-05-07T19:43:00.9395774Z siblings : 48 2025-05-07T19:43:00.9395853Z core id : 23 2025-05-07T19:43:00.9395951Z cpu cores : 24 2025-05-07T19:43:00.9396142Z apicid : 47 2025-05-07T19:43:00.9396225Z initial apicid : 47 2025-05-07T19:43:00.9410364Z fpu : yes 2025-05-07T19:43:00.9410535Z fpu_exception : yes 2025-05-07T19:43:00.9410632Z cpuid level : 13 2025-05-07T19:43:00.9410715Z wp : yes 2025-05-07T19:43:00.9412883Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.9413281Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.9413363Z bogomips : 5999.99 2025-05-07T19:43:00.9413451Z clflush size : 64 2025-05-07T19:43:00.9413693Z cache_alignment : 64 2025-05-07T19:43:00.9413822Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.9413915Z power management: 2025-05-07T19:43:00.9413922Z 2025-05-07T19:43:00.9414020Z processor : 72 2025-05-07T19:43:00.9414112Z vendor_id : GenuineIntel 2025-05-07T19:43:00.9414199Z cpu family : 6 2025-05-07T19:43:00.9414289Z model : 85 2025-05-07T19:43:00.9414452Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.9414531Z stepping : 7 2025-05-07T19:43:00.9414615Z microcode : 0x5003901 2025-05-07T19:43:00.9414707Z cpu MHz : 2999.996 2025-05-07T19:43:00.9414791Z cache size : 36608 KB 2025-05-07T19:43:00.9414876Z physical id : 1 2025-05-07T19:43:00.9414965Z siblings : 48 2025-05-07T19:43:00.9415044Z core id : 0 2025-05-07T19:43:00.9415123Z cpu cores : 24 2025-05-07T19:43:00.9415202Z apicid : 65 2025-05-07T19:43:00.9415288Z initial apicid : 65 2025-05-07T19:43:00.9415367Z fpu : yes 2025-05-07T19:43:00.9415447Z fpu_exception : yes 2025-05-07T19:43:00.9415528Z cpuid level : 13 2025-05-07T19:43:00.9415615Z wp : yes 2025-05-07T19:43:00.9417744Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.9418217Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.9418303Z bogomips : 5999.99 2025-05-07T19:43:00.9418384Z clflush size : 64 2025-05-07T19:43:00.9418467Z cache_alignment : 64 2025-05-07T19:43:00.9418607Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.9418691Z power management: 2025-05-07T19:43:00.9418696Z 2025-05-07T19:43:00.9418777Z processor : 73 2025-05-07T19:43:00.9418876Z vendor_id : GenuineIntel 2025-05-07T19:43:00.9418955Z cpu family : 6 2025-05-07T19:43:00.9419032Z model : 85 2025-05-07T19:43:00.9419205Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.9419285Z stepping : 7 2025-05-07T19:43:00.9419367Z microcode : 0x5003901 2025-05-07T19:43:00.9419447Z cpu MHz : 2999.996 2025-05-07T19:43:00.9419542Z cache size : 36608 KB 2025-05-07T19:43:00.9419623Z physical id : 1 2025-05-07T19:43:00.9419700Z siblings : 48 2025-05-07T19:43:00.9419781Z core id : 1 2025-05-07T19:43:00.9419872Z cpu cores : 24 2025-05-07T19:43:00.9419950Z apicid : 67 2025-05-07T19:43:00.9420035Z initial apicid : 67 2025-05-07T19:43:00.9420122Z fpu : yes 2025-05-07T19:43:00.9420201Z fpu_exception : yes 2025-05-07T19:43:00.9420282Z cpuid level : 13 2025-05-07T19:43:00.9420361Z wp : yes 2025-05-07T19:43:00.9422644Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.9423001Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.9423086Z bogomips : 5999.99 2025-05-07T19:43:00.9423162Z clflush size : 64 2025-05-07T19:43:00.9423239Z cache_alignment : 64 2025-05-07T19:43:00.9423446Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.9423607Z power management: 2025-05-07T19:43:00.9423612Z 2025-05-07T19:43:00.9423685Z processor : 74 2025-05-07T19:43:00.9423939Z vendor_id : GenuineIntel 2025-05-07T19:43:00.9424024Z cpu family : 6 2025-05-07T19:43:00.9424102Z model : 85 2025-05-07T19:43:00.9424262Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.9424422Z stepping : 7 2025-05-07T19:43:00.9424511Z microcode : 0x5003901 2025-05-07T19:43:00.9424588Z cpu MHz : 2999.996 2025-05-07T19:43:00.9424669Z cache size : 36608 KB 2025-05-07T19:43:00.9424759Z physical id : 1 2025-05-07T19:43:00.9424834Z siblings : 48 2025-05-07T19:43:00.9424910Z core id : 2 2025-05-07T19:43:00.9424993Z cpu cores : 24 2025-05-07T19:43:00.9425080Z apicid : 69 2025-05-07T19:43:00.9425164Z initial apicid : 69 2025-05-07T19:43:00.9425239Z fpu : yes 2025-05-07T19:43:00.9425330Z fpu_exception : yes 2025-05-07T19:43:00.9425409Z cpuid level : 13 2025-05-07T19:43:00.9425482Z wp : yes 2025-05-07T19:43:00.9427619Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.9428048Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.9428131Z bogomips : 5999.99 2025-05-07T19:43:00.9428215Z clflush size : 64 2025-05-07T19:43:00.9428296Z cache_alignment : 64 2025-05-07T19:43:00.9428421Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.9428503Z power management: 2025-05-07T19:43:00.9428512Z 2025-05-07T19:43:00.9428597Z processor : 75 2025-05-07T19:43:00.9428682Z vendor_id : GenuineIntel 2025-05-07T19:43:00.9428757Z cpu family : 6 2025-05-07T19:43:00.9428842Z model : 85 2025-05-07T19:43:00.9428997Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.9429075Z stepping : 7 2025-05-07T19:43:00.9429155Z microcode : 0x5003901 2025-05-07T19:43:00.9429242Z cpu MHz : 2999.996 2025-05-07T19:43:00.9429322Z cache size : 36608 KB 2025-05-07T19:43:00.9429402Z physical id : 1 2025-05-07T19:43:00.9429487Z siblings : 48 2025-05-07T19:43:00.9429561Z core id : 3 2025-05-07T19:43:00.9429638Z cpu cores : 24 2025-05-07T19:43:00.9429714Z apicid : 71 2025-05-07T19:43:00.9429808Z initial apicid : 71 2025-05-07T19:43:00.9429886Z fpu : yes 2025-05-07T19:43:00.9429966Z fpu_exception : yes 2025-05-07T19:43:00.9430053Z cpuid level : 13 2025-05-07T19:43:00.9430127Z wp : yes 2025-05-07T19:43:00.9432246Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.9432639Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.9432724Z bogomips : 5999.99 2025-05-07T19:43:00.9432802Z clflush size : 64 2025-05-07T19:43:00.9432893Z cache_alignment : 64 2025-05-07T19:43:00.9433021Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.9433103Z power management: 2025-05-07T19:43:00.9433107Z 2025-05-07T19:43:00.9433923Z processor : 76 2025-05-07T19:43:00.9434020Z vendor_id : GenuineIntel 2025-05-07T19:43:00.9434099Z cpu family : 6 2025-05-07T19:43:00.9434176Z model : 85 2025-05-07T19:43:00.9434338Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.9434418Z stepping : 7 2025-05-07T19:43:00.9434499Z microcode : 0x5003901 2025-05-07T19:43:00.9434578Z cpu MHz : 2188.452 2025-05-07T19:43:00.9434666Z cache size : 36608 KB 2025-05-07T19:43:00.9434745Z physical id : 1 2025-05-07T19:43:00.9434820Z siblings : 48 2025-05-07T19:43:00.9434903Z core id : 4 2025-05-07T19:43:00.9434980Z cpu cores : 24 2025-05-07T19:43:00.9435054Z apicid : 73 2025-05-07T19:43:00.9435135Z initial apicid : 73 2025-05-07T19:43:00.9435224Z fpu : yes 2025-05-07T19:43:00.9435306Z fpu_exception : yes 2025-05-07T19:43:00.9435384Z cpuid level : 13 2025-05-07T19:43:00.9435459Z wp : yes 2025-05-07T19:43:00.9437580Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.9437937Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.9438018Z bogomips : 5999.99 2025-05-07T19:43:00.9438146Z clflush size : 64 2025-05-07T19:43:00.9438224Z cache_alignment : 64 2025-05-07T19:43:00.9438348Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.9438426Z power management: 2025-05-07T19:43:00.9438430Z 2025-05-07T19:43:00.9438503Z processor : 77 2025-05-07T19:43:00.9438587Z vendor_id : GenuineIntel 2025-05-07T19:43:00.9438669Z cpu family : 6 2025-05-07T19:43:00.9438740Z model : 85 2025-05-07T19:43:00.9438884Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.9438965Z stepping : 7 2025-05-07T19:43:00.9439041Z microcode : 0x5003901 2025-05-07T19:43:00.9439112Z cpu MHz : 2999.996 2025-05-07T19:43:00.9439189Z cache size : 36608 KB 2025-05-07T19:43:00.9439268Z physical id : 1 2025-05-07T19:43:00.9439338Z siblings : 48 2025-05-07T19:43:00.9439408Z core id : 5 2025-05-07T19:43:00.9439488Z cpu cores : 24 2025-05-07T19:43:00.9439557Z apicid : 75 2025-05-07T19:43:00.9439631Z initial apicid : 75 2025-05-07T19:43:00.9439702Z fpu : yes 2025-05-07T19:43:00.9439788Z fpu_exception : yes 2025-05-07T19:43:00.9439860Z cpuid level : 13 2025-05-07T19:43:00.9439928Z wp : yes 2025-05-07T19:43:00.9441897Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.9442251Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.9442325Z bogomips : 5999.99 2025-05-07T19:43:00.9442406Z clflush size : 64 2025-05-07T19:43:00.9442486Z cache_alignment : 64 2025-05-07T19:43:00.9442603Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.9442686Z power management: 2025-05-07T19:43:00.9442691Z 2025-05-07T19:43:00.9442763Z processor : 78 2025-05-07T19:43:00.9442844Z vendor_id : GenuineIntel 2025-05-07T19:43:00.9442966Z cpu family : 6 2025-05-07T19:43:00.9443043Z model : 85 2025-05-07T19:43:00.9443185Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.9443256Z stepping : 7 2025-05-07T19:43:00.9443341Z microcode : 0x5003901 2025-05-07T19:43:00.9443413Z cpu MHz : 1983.427 2025-05-07T19:43:00.9443490Z cache size : 36608 KB 2025-05-07T19:43:00.9443562Z physical id : 1 2025-05-07T19:43:00.9443639Z siblings : 48 2025-05-07T19:43:00.9443710Z core id : 6 2025-05-07T19:43:00.9443781Z cpu cores : 24 2025-05-07T19:43:00.9443851Z apicid : 77 2025-05-07T19:43:00.9443931Z initial apicid : 77 2025-05-07T19:43:00.9444000Z fpu : yes 2025-05-07T19:43:00.9444076Z fpu_exception : yes 2025-05-07T19:43:00.9444167Z cpuid level : 13 2025-05-07T19:43:00.9444239Z wp : yes 2025-05-07T19:43:00.9446199Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.9446567Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.9446641Z bogomips : 5999.99 2025-05-07T19:43:00.9446715Z clflush size : 64 2025-05-07T19:43:00.9446800Z cache_alignment : 64 2025-05-07T19:43:00.9446966Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.9447044Z power management: 2025-05-07T19:43:00.9447049Z 2025-05-07T19:43:00.9447127Z processor : 79 2025-05-07T19:43:00.9447207Z vendor_id : GenuineIntel 2025-05-07T19:43:00.9447279Z cpu family : 6 2025-05-07T19:43:00.9447350Z model : 85 2025-05-07T19:43:00.9447506Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.9447580Z stepping : 7 2025-05-07T19:43:00.9447657Z microcode : 0x5003901 2025-05-07T19:43:00.9447737Z cpu MHz : 2999.996 2025-05-07T19:43:00.9447811Z cache size : 36608 KB 2025-05-07T19:43:00.9447886Z physical id : 1 2025-05-07T19:43:00.9447956Z siblings : 48 2025-05-07T19:43:00.9448031Z core id : 7 2025-05-07T19:43:00.9448101Z cpu cores : 24 2025-05-07T19:43:00.9448174Z apicid : 79 2025-05-07T19:43:00.9448252Z initial apicid : 79 2025-05-07T19:43:00.9448331Z fpu : yes 2025-05-07T19:43:00.9448409Z fpu_exception : yes 2025-05-07T19:43:00.9448485Z cpuid level : 13 2025-05-07T19:43:00.9448563Z wp : yes 2025-05-07T19:43:00.9450526Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.9450881Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.9450965Z bogomips : 5999.99 2025-05-07T19:43:00.9451041Z clflush size : 64 2025-05-07T19:43:00.9451116Z cache_alignment : 64 2025-05-07T19:43:00.9451239Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.9451317Z power management: 2025-05-07T19:43:00.9451321Z 2025-05-07T19:43:00.9451397Z processor : 80 2025-05-07T19:43:00.9451479Z vendor_id : GenuineIntel 2025-05-07T19:43:00.9451560Z cpu family : 6 2025-05-07T19:43:00.9451630Z model : 85 2025-05-07T19:43:00.9451771Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.9451908Z stepping : 7 2025-05-07T19:43:00.9451984Z microcode : 0x5003901 2025-05-07T19:43:00.9452056Z cpu MHz : 2999.996 2025-05-07T19:43:00.9452131Z cache size : 36608 KB 2025-05-07T19:43:00.9452211Z physical id : 1 2025-05-07T19:43:00.9452282Z siblings : 48 2025-05-07T19:43:00.9452355Z core id : 8 2025-05-07T19:43:00.9452441Z cpu cores : 24 2025-05-07T19:43:00.9452510Z apicid : 81 2025-05-07T19:43:00.9452583Z initial apicid : 81 2025-05-07T19:43:00.9452654Z fpu : yes 2025-05-07T19:43:00.9452737Z fpu_exception : yes 2025-05-07T19:43:00.9452808Z cpuid level : 13 2025-05-07T19:43:00.9452878Z wp : yes 2025-05-07T19:43:00.9454850Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.9455203Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.9455278Z bogomips : 5999.99 2025-05-07T19:43:00.9455360Z clflush size : 64 2025-05-07T19:43:00.9455435Z cache_alignment : 64 2025-05-07T19:43:00.9455554Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.9455637Z power management: 2025-05-07T19:43:00.9455688Z 2025-05-07T19:43:00.9455762Z processor : 81 2025-05-07T19:43:00.9455841Z vendor_id : GenuineIntel 2025-05-07T19:43:00.9455914Z cpu family : 6 2025-05-07T19:43:00.9455992Z model : 85 2025-05-07T19:43:00.9456132Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.9456206Z stepping : 7 2025-05-07T19:43:00.9456293Z microcode : 0x5003901 2025-05-07T19:43:00.9456366Z cpu MHz : 1473.905 2025-05-07T19:43:00.9456441Z cache size : 36608 KB 2025-05-07T19:43:00.9456512Z physical id : 1 2025-05-07T19:43:00.9456591Z siblings : 48 2025-05-07T19:43:00.9456659Z core id : 9 2025-05-07T19:43:00.9456729Z cpu cores : 24 2025-05-07T19:43:00.9456806Z apicid : 83 2025-05-07T19:43:00.9456882Z initial apicid : 83 2025-05-07T19:43:00.9456950Z fpu : yes 2025-05-07T19:43:00.9457028Z fpu_exception : yes 2025-05-07T19:43:00.9457110Z cpuid level : 13 2025-05-07T19:43:00.9457183Z wp : yes 2025-05-07T19:43:00.9459144Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.9459503Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.9459578Z bogomips : 5999.99 2025-05-07T19:43:00.9459651Z clflush size : 64 2025-05-07T19:43:00.9459735Z cache_alignment : 64 2025-05-07T19:43:00.9459853Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.9459933Z power management: 2025-05-07T19:43:00.9459937Z 2025-05-07T19:43:00.9460023Z processor : 82 2025-05-07T19:43:00.9460103Z vendor_id : GenuineIntel 2025-05-07T19:43:00.9460177Z cpu family : 6 2025-05-07T19:43:00.9460247Z model : 85 2025-05-07T19:43:00.9460396Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.9460469Z stepping : 7 2025-05-07T19:43:00.9460609Z microcode : 0x5003901 2025-05-07T19:43:00.9460688Z cpu MHz : 1400.674 2025-05-07T19:43:00.9460764Z cache size : 36608 KB 2025-05-07T19:43:00.9460838Z physical id : 1 2025-05-07T19:43:00.9460908Z siblings : 48 2025-05-07T19:43:00.9460983Z core id : 10 2025-05-07T19:43:00.9461055Z cpu cores : 24 2025-05-07T19:43:00.9461126Z apicid : 85 2025-05-07T19:43:00.9461210Z initial apicid : 85 2025-05-07T19:43:00.9461281Z fpu : yes 2025-05-07T19:43:00.9461357Z fpu_exception : yes 2025-05-07T19:43:00.9461427Z cpuid level : 13 2025-05-07T19:43:00.9461501Z wp : yes 2025-05-07T19:43:00.9463566Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.9464120Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.9464201Z bogomips : 5999.99 2025-05-07T19:43:00.9464281Z clflush size : 64 2025-05-07T19:43:00.9464408Z cache_alignment : 64 2025-05-07T19:43:00.9464541Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.9464621Z power management: 2025-05-07T19:43:00.9464626Z 2025-05-07T19:43:00.9464703Z processor : 83 2025-05-07T19:43:00.9464859Z vendor_id : GenuineIntel 2025-05-07T19:43:00.9464938Z cpu family : 6 2025-05-07T19:43:00.9465011Z model : 85 2025-05-07T19:43:00.9465165Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.9465251Z stepping : 7 2025-05-07T19:43:00.9465331Z microcode : 0x5003901 2025-05-07T19:43:00.9465410Z cpu MHz : 2999.996 2025-05-07T19:43:00.9465501Z cache size : 36608 KB 2025-05-07T19:43:00.9465580Z physical id : 1 2025-05-07T19:43:00.9465659Z siblings : 48 2025-05-07T19:43:00.9465738Z core id : 11 2025-05-07T19:43:00.9465825Z cpu cores : 24 2025-05-07T19:43:00.9465902Z apicid : 87 2025-05-07T19:43:00.9465987Z initial apicid : 87 2025-05-07T19:43:00.9466071Z fpu : yes 2025-05-07T19:43:00.9466152Z fpu_exception : yes 2025-05-07T19:43:00.9466234Z cpuid level : 13 2025-05-07T19:43:00.9466313Z wp : yes 2025-05-07T19:43:00.9468442Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.9468824Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.9468916Z bogomips : 5999.99 2025-05-07T19:43:00.9468993Z clflush size : 64 2025-05-07T19:43:00.9469079Z cache_alignment : 64 2025-05-07T19:43:00.9469206Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.9469300Z power management: 2025-05-07T19:43:00.9469304Z 2025-05-07T19:43:00.9469385Z processor : 84 2025-05-07T19:43:00.9469472Z vendor_id : GenuineIntel 2025-05-07T19:43:00.9469560Z cpu family : 6 2025-05-07T19:43:00.9469639Z model : 85 2025-05-07T19:43:00.9469798Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.9469882Z stepping : 7 2025-05-07T19:43:00.9469975Z microcode : 0x5003901 2025-05-07T19:43:00.9470052Z cpu MHz : 1847.249 2025-05-07T19:43:00.9470189Z cache size : 36608 KB 2025-05-07T19:43:00.9470279Z physical id : 1 2025-05-07T19:43:00.9470355Z siblings : 48 2025-05-07T19:43:00.9470434Z core id : 12 2025-05-07T19:43:00.9470514Z cpu cores : 24 2025-05-07T19:43:00.9470601Z apicid : 89 2025-05-07T19:43:00.9470683Z initial apicid : 89 2025-05-07T19:43:00.9470760Z fpu : yes 2025-05-07T19:43:00.9470854Z fpu_exception : yes 2025-05-07T19:43:00.9470932Z cpuid level : 13 2025-05-07T19:43:00.9471011Z wp : yes 2025-05-07T19:43:00.9473150Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.9473531Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.9473613Z bogomips : 5999.99 2025-05-07T19:43:00.9473705Z clflush size : 64 2025-05-07T19:43:00.9473789Z cache_alignment : 64 2025-05-07T19:43:00.9473920Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.9474004Z power management: 2025-05-07T19:43:00.9474009Z 2025-05-07T19:43:00.9474097Z processor : 85 2025-05-07T19:43:00.9474182Z vendor_id : GenuineIntel 2025-05-07T19:43:00.9474263Z cpu family : 6 2025-05-07T19:43:00.9474350Z model : 85 2025-05-07T19:43:00.9474554Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.9474638Z stepping : 7 2025-05-07T19:43:00.9474723Z microcode : 0x5003901 2025-05-07T19:43:00.9474809Z cpu MHz : 2999.996 2025-05-07T19:43:00.9474893Z cache size : 36608 KB 2025-05-07T19:43:00.9474976Z physical id : 1 2025-05-07T19:43:00.9475058Z siblings : 48 2025-05-07T19:43:00.9475132Z core id : 13 2025-05-07T19:43:00.9475212Z cpu cores : 24 2025-05-07T19:43:00.9475287Z apicid : 91 2025-05-07T19:43:00.9475380Z initial apicid : 91 2025-05-07T19:43:00.9475457Z fpu : yes 2025-05-07T19:43:00.9475542Z fpu_exception : yes 2025-05-07T19:43:00.9475630Z cpuid level : 13 2025-05-07T19:43:00.9475705Z wp : yes 2025-05-07T19:43:00.9477793Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.9478158Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.9478233Z bogomips : 5999.99 2025-05-07T19:43:00.9478303Z clflush size : 64 2025-05-07T19:43:00.9478389Z cache_alignment : 64 2025-05-07T19:43:00.9478505Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.9478583Z power management: 2025-05-07T19:43:00.9478588Z 2025-05-07T19:43:00.9478662Z processor : 86 2025-05-07T19:43:00.9478753Z vendor_id : GenuineIntel 2025-05-07T19:43:00.9478824Z cpu family : 6 2025-05-07T19:43:00.9478896Z model : 85 2025-05-07T19:43:00.9479055Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.9479127Z stepping : 7 2025-05-07T19:43:00.9479202Z microcode : 0x5003901 2025-05-07T19:43:00.9479273Z cpu MHz : 2999.996 2025-05-07T19:43:00.9479357Z cache size : 36608 KB 2025-05-07T19:43:00.9479429Z physical id : 1 2025-05-07T19:43:00.9479500Z siblings : 48 2025-05-07T19:43:00.9479667Z core id : 14 2025-05-07T19:43:00.9479736Z cpu cores : 24 2025-05-07T19:43:00.9479807Z apicid : 93 2025-05-07T19:43:00.9479883Z initial apicid : 93 2025-05-07T19:43:00.9479965Z fpu : yes 2025-05-07T19:43:00.9480041Z fpu_exception : yes 2025-05-07T19:43:00.9480116Z cpuid level : 13 2025-05-07T19:43:00.9480189Z wp : yes 2025-05-07T19:43:00.9482155Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.9482511Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.9482597Z bogomips : 5999.99 2025-05-07T19:43:00.9482670Z clflush size : 64 2025-05-07T19:43:00.9482746Z cache_alignment : 64 2025-05-07T19:43:00.9482876Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.9482951Z power management: 2025-05-07T19:43:00.9482956Z 2025-05-07T19:43:00.9483027Z processor : 87 2025-05-07T19:43:00.9483108Z vendor_id : GenuineIntel 2025-05-07T19:43:00.9483186Z cpu family : 6 2025-05-07T19:43:00.9483255Z model : 85 2025-05-07T19:43:00.9483401Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.9483531Z stepping : 7 2025-05-07T19:43:00.9483608Z microcode : 0x5003901 2025-05-07T19:43:00.9483679Z cpu MHz : 1474.711 2025-05-07T19:43:00.9483759Z cache size : 36608 KB 2025-05-07T19:43:00.9483846Z physical id : 1 2025-05-07T19:43:00.9483917Z siblings : 48 2025-05-07T19:43:00.9483988Z core id : 15 2025-05-07T19:43:00.9484074Z cpu cores : 24 2025-05-07T19:43:00.9484143Z apicid : 95 2025-05-07T19:43:00.9484220Z initial apicid : 95 2025-05-07T19:43:00.9484293Z fpu : yes 2025-05-07T19:43:00.9484380Z fpu_exception : yes 2025-05-07T19:43:00.9484455Z cpuid level : 13 2025-05-07T19:43:00.9484527Z wp : yes 2025-05-07T19:43:00.9486499Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.9486850Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.9486929Z bogomips : 5999.99 2025-05-07T19:43:00.9487014Z clflush size : 64 2025-05-07T19:43:00.9487088Z cache_alignment : 64 2025-05-07T19:43:00.9487207Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.9487285Z power management: 2025-05-07T19:43:00.9487297Z 2025-05-07T19:43:00.9487370Z processor : 88 2025-05-07T19:43:00.9487456Z vendor_id : GenuineIntel 2025-05-07T19:43:00.9487530Z cpu family : 6 2025-05-07T19:43:00.9487603Z model : 85 2025-05-07T19:43:00.9487753Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.9487827Z stepping : 7 2025-05-07T19:43:00.9487916Z microcode : 0x5003901 2025-05-07T19:43:00.9487989Z cpu MHz : 2999.996 2025-05-07T19:43:00.9488066Z cache size : 36608 KB 2025-05-07T19:43:00.9488140Z physical id : 1 2025-05-07T19:43:00.9488221Z siblings : 48 2025-05-07T19:43:00.9488292Z core id : 16 2025-05-07T19:43:00.9488366Z cpu cores : 24 2025-05-07T19:43:00.9488490Z apicid : 97 2025-05-07T19:43:00.9488574Z initial apicid : 97 2025-05-07T19:43:00.9488647Z fpu : yes 2025-05-07T19:43:00.9488726Z fpu_exception : yes 2025-05-07T19:43:00.9488809Z cpuid level : 13 2025-05-07T19:43:00.9488882Z wp : yes 2025-05-07T19:43:00.9490843Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.9491205Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.9491282Z bogomips : 5999.99 2025-05-07T19:43:00.9491356Z clflush size : 64 2025-05-07T19:43:00.9491446Z cache_alignment : 64 2025-05-07T19:43:00.9491564Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.9491642Z power management: 2025-05-07T19:43:00.9491646Z 2025-05-07T19:43:00.9491733Z processor : 89 2025-05-07T19:43:00.9491814Z vendor_id : GenuineIntel 2025-05-07T19:43:00.9491884Z cpu family : 6 2025-05-07T19:43:00.9491951Z model : 85 2025-05-07T19:43:00.9492103Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.9492175Z stepping : 7 2025-05-07T19:43:00.9492248Z microcode : 0x5003901 2025-05-07T19:43:00.9492381Z cpu MHz : 2973.630 2025-05-07T19:43:00.9492462Z cache size : 36608 KB 2025-05-07T19:43:00.9492537Z physical id : 1 2025-05-07T19:43:00.9492606Z siblings : 48 2025-05-07T19:43:00.9492682Z core id : 17 2025-05-07T19:43:00.9492753Z cpu cores : 24 2025-05-07T19:43:00.9492828Z apicid : 99 2025-05-07T19:43:00.9492906Z initial apicid : 99 2025-05-07T19:43:00.9492986Z fpu : yes 2025-05-07T19:43:00.9493061Z fpu_exception : yes 2025-05-07T19:43:00.9493136Z cpuid level : 13 2025-05-07T19:43:00.9493216Z wp : yes 2025-05-07T19:43:00.9495173Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.9495529Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.9495618Z bogomips : 5999.99 2025-05-07T19:43:00.9495692Z clflush size : 64 2025-05-07T19:43:00.9495774Z cache_alignment : 64 2025-05-07T19:43:00.9495896Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.9495973Z power management: 2025-05-07T19:43:00.9495977Z 2025-05-07T19:43:00.9496050Z processor : 90 2025-05-07T19:43:00.9496131Z vendor_id : GenuineIntel 2025-05-07T19:43:00.9496215Z cpu family : 6 2025-05-07T19:43:00.9496284Z model : 85 2025-05-07T19:43:00.9496427Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.9496511Z stepping : 7 2025-05-07T19:43:00.9496588Z microcode : 0x5003901 2025-05-07T19:43:00.9496660Z cpu MHz : 2999.996 2025-05-07T19:43:00.9496739Z cache size : 36608 KB 2025-05-07T19:43:00.9496818Z physical id : 1 2025-05-07T19:43:00.9496887Z siblings : 48 2025-05-07T19:43:00.9496957Z core id : 18 2025-05-07T19:43:00.9497035Z cpu cores : 24 2025-05-07T19:43:00.9497106Z apicid : 101 2025-05-07T19:43:00.9497181Z initial apicid : 101 2025-05-07T19:43:00.9497303Z fpu : yes 2025-05-07T19:43:00.9497387Z fpu_exception : yes 2025-05-07T19:43:00.9497458Z cpuid level : 13 2025-05-07T19:43:00.9497528Z wp : yes 2025-05-07T19:43:00.9499492Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.9499841Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.9499915Z bogomips : 5999.99 2025-05-07T19:43:00.9500000Z clflush size : 64 2025-05-07T19:43:00.9500074Z cache_alignment : 64 2025-05-07T19:43:00.9500189Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.9500270Z power management: 2025-05-07T19:43:00.9500274Z 2025-05-07T19:43:00.9500344Z processor : 91 2025-05-07T19:43:00.9500421Z vendor_id : GenuineIntel 2025-05-07T19:43:00.9500489Z cpu family : 6 2025-05-07T19:43:00.9500566Z model : 85 2025-05-07T19:43:00.9500710Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.9500782Z stepping : 7 2025-05-07T19:43:00.9500869Z microcode : 0x5003901 2025-05-07T19:43:00.9500941Z cpu MHz : 1767.690 2025-05-07T19:43:00.9501016Z cache size : 36608 KB 2025-05-07T19:43:00.9501138Z physical id : 1 2025-05-07T19:43:00.9501217Z siblings : 48 2025-05-07T19:43:00.9501286Z core id : 19 2025-05-07T19:43:00.9501359Z cpu cores : 24 2025-05-07T19:43:00.9501438Z apicid : 103 2025-05-07T19:43:00.9501513Z initial apicid : 103 2025-05-07T19:43:00.9501582Z fpu : yes 2025-05-07T19:43:00.9501662Z fpu_exception : yes 2025-05-07T19:43:00.9501741Z cpuid level : 13 2025-05-07T19:43:00.9501810Z wp : yes 2025-05-07T19:43:00.9504046Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.9504440Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.9504521Z bogomips : 5999.99 2025-05-07T19:43:00.9504600Z clflush size : 64 2025-05-07T19:43:00.9504693Z cache_alignment : 64 2025-05-07T19:43:00.9504822Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.9504904Z power management: 2025-05-07T19:43:00.9504909Z 2025-05-07T19:43:00.9504995Z processor : 92 2025-05-07T19:43:00.9505082Z vendor_id : GenuineIntel 2025-05-07T19:43:00.9505161Z cpu family : 6 2025-05-07T19:43:00.9505237Z model : 85 2025-05-07T19:43:00.9505402Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.9505480Z stepping : 7 2025-05-07T19:43:00.9505565Z microcode : 0x5003901 2025-05-07T19:43:00.9505649Z cpu MHz : 1427.544 2025-05-07T19:43:00.9505728Z cache size : 36608 KB 2025-05-07T19:43:00.9505806Z physical id : 1 2025-05-07T19:43:00.9505881Z siblings : 48 2025-05-07T19:43:00.9505969Z core id : 20 2025-05-07T19:43:00.9506044Z cpu cores : 24 2025-05-07T19:43:00.9506119Z apicid : 105 2025-05-07T19:43:00.9506208Z initial apicid : 105 2025-05-07T19:43:00.9506282Z fpu : yes 2025-05-07T19:43:00.9506365Z fpu_exception : yes 2025-05-07T19:43:00.9506509Z cpuid level : 13 2025-05-07T19:43:00.9506591Z wp : yes 2025-05-07T19:43:00.9509070Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.9509461Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.9509540Z bogomips : 5999.99 2025-05-07T19:43:00.9509618Z clflush size : 64 2025-05-07T19:43:00.9509701Z cache_alignment : 64 2025-05-07T19:43:00.9509834Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.9509917Z power management: 2025-05-07T19:43:00.9509922Z 2025-05-07T19:43:00.9510002Z processor : 93 2025-05-07T19:43:00.9510096Z vendor_id : GenuineIntel 2025-05-07T19:43:00.9510171Z cpu family : 6 2025-05-07T19:43:00.9510246Z model : 85 2025-05-07T19:43:00.9510402Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.9510486Z stepping : 7 2025-05-07T19:43:00.9510565Z microcode : 0x5003901 2025-05-07T19:43:00.9510642Z cpu MHz : 1478.046 2025-05-07T19:43:00.9510728Z cache size : 36608 KB 2025-05-07T19:43:00.9510806Z physical id : 1 2025-05-07T19:43:00.9510882Z siblings : 48 2025-05-07T19:43:00.9510954Z core id : 21 2025-05-07T19:43:00.9511130Z cpu cores : 24 2025-05-07T19:43:00.9511207Z apicid : 107 2025-05-07T19:43:00.9511288Z initial apicid : 107 2025-05-07T19:43:00.9511365Z fpu : yes 2025-05-07T19:43:00.9511444Z fpu_exception : yes 2025-05-07T19:43:00.9511519Z cpuid level : 13 2025-05-07T19:43:00.9511591Z wp : yes 2025-05-07T19:43:00.9513739Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.9514127Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.9514229Z bogomips : 5999.99 2025-05-07T19:43:00.9514315Z clflush size : 64 2025-05-07T19:43:00.9514404Z cache_alignment : 64 2025-05-07T19:43:00.9514538Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.9514649Z power management: 2025-05-07T19:43:00.9514654Z 2025-05-07T19:43:00.9514739Z processor : 94 2025-05-07T19:43:00.9514836Z vendor_id : GenuineIntel 2025-05-07T19:43:00.9514935Z cpu family : 6 2025-05-07T19:43:00.9515016Z model : 85 2025-05-07T19:43:00.9515177Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.9515260Z stepping : 7 2025-05-07T19:43:00.9515362Z microcode : 0x5003901 2025-05-07T19:43:00.9515446Z cpu MHz : 1824.435 2025-05-07T19:43:00.9515534Z cache size : 36608 KB 2025-05-07T19:43:00.9515632Z physical id : 1 2025-05-07T19:43:00.9515713Z siblings : 48 2025-05-07T19:43:00.9515793Z core id : 22 2025-05-07T19:43:00.9515880Z cpu cores : 24 2025-05-07T19:43:00.9515978Z apicid : 109 2025-05-07T19:43:00.9516067Z initial apicid : 109 2025-05-07T19:43:00.9516146Z fpu : yes 2025-05-07T19:43:00.9516234Z fpu_exception : yes 2025-05-07T19:43:00.9516331Z cpuid level : 13 2025-05-07T19:43:00.9516412Z wp : yes 2025-05-07T19:43:00.9518538Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.9519013Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.9519099Z bogomips : 5999.99 2025-05-07T19:43:00.9519184Z clflush size : 64 2025-05-07T19:43:00.9519285Z cache_alignment : 64 2025-05-07T19:43:00.9519417Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.9519505Z power management: 2025-05-07T19:43:00.9519513Z 2025-05-07T19:43:00.9519618Z processor : 95 2025-05-07T19:43:00.9519715Z vendor_id : GenuineIntel 2025-05-07T19:43:00.9519799Z cpu family : 6 2025-05-07T19:43:00.9519896Z model : 85 2025-05-07T19:43:00.9520059Z model name : Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:43:00.9520146Z stepping : 7 2025-05-07T19:43:00.9520346Z microcode : 0x5003901 2025-05-07T19:43:00.9520445Z cpu MHz : 2999.996 2025-05-07T19:43:00.9520526Z cache size : 36608 KB 2025-05-07T19:43:00.9520606Z physical id : 1 2025-05-07T19:43:00.9520697Z siblings : 48 2025-05-07T19:43:00.9520773Z core id : 23 2025-05-07T19:43:00.9520851Z cpu cores : 24 2025-05-07T19:43:00.9520928Z apicid : 111 2025-05-07T19:43:00.9521080Z initial apicid : 111 2025-05-07T19:43:00.9521157Z fpu : yes 2025-05-07T19:43:00.9521238Z fpu_exception : yes 2025-05-07T19:43:00.9521316Z cpuid level : 13 2025-05-07T19:43:00.9521407Z wp : yes 2025-05-07T19:43:00.9523371Z flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:43:00.9523744Z bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data retbleed gds bhi 2025-05-07T19:43:00.9523825Z bogomips : 5999.99 2025-05-07T19:43:00.9523901Z clflush size : 64 2025-05-07T19:43:00.9523983Z cache_alignment : 64 2025-05-07T19:43:00.9524114Z address sizes : 46 bits physical, 48 bits virtual 2025-05-07T19:43:00.9524193Z power management: 2025-05-07T19:43:00.9524198Z 2025-05-07T19:43:00.9524205Z 2025-05-07T19:43:00.9524314Z ################################################################################ 2025-05-07T19:43:00.9524415Z [INFO] Print PCI info ... 2025-05-07T19:43:00.9524494Z + lspci -v 2025-05-07T19:43:00.9524498Z 2025-05-07T19:43:00.9524667Z 00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] 2025-05-07T19:43:00.9524779Z Subsystem: Amazon.com, Inc. Device 1237 2025-05-07T19:43:00.9524887Z Flags: bus master, medium devsel, latency 0 2025-05-07T19:43:00.9524891Z 2025-05-07T19:43:00.9525075Z 00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II] 2025-05-07T19:43:00.9525158Z Physical Slot: 1 2025-05-07T19:43:00.9525261Z Flags: bus master, fast devsel, latency 0 2025-05-07T19:43:00.9525265Z 2025-05-07T19:43:00.9525498Z 00:01.3 Non-VGA unclassified device: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 08) 2025-05-07T19:43:00.9525577Z Physical Slot: 1 2025-05-07T19:43:00.9525699Z Flags: bus master, fast devsel, latency 0, IRQ 9 2025-05-07T19:43:00.9525753Z 2025-05-07T19:43:00.9525999Z 00:03.0 VGA compatible controller: Amazon.com, Inc. Device 1111 (prog-if 00 [VGA controller]) 2025-05-07T19:43:00.9526075Z Physical Slot: 3 2025-05-07T19:43:00.9526183Z Flags: bus master, fast devsel, latency 0 2025-05-07T19:43:00.9526305Z Memory at c0000000 (32-bit, prefetchable) [size=4M] 2025-05-07T19:43:00.9526421Z Expansion ROM at 000c0000 [disabled] [size=128K] 2025-05-07T19:43:00.9526425Z 2025-05-07T19:43:00.9526719Z 00:04.0 Non-Volatile memory controller: Amazon.com, Inc. NVMe EBS Controller (prog-if 02 [NVM Express]) 2025-05-07T19:43:00.9526820Z Subsystem: Amazon.com, Inc. Device 0000 2025-05-07T19:43:00.9526896Z Physical Slot: 4 2025-05-07T19:43:00.9527026Z Flags: bus master, fast devsel, latency 0, IRQ 11 2025-05-07T19:43:00.9527173Z Memory at c0514000 (32-bit, non-prefetchable) [size=16K] 2025-05-07T19:43:00.9527262Z Capabilities: 2025-05-07T19:43:00.9527347Z Kernel driver in use: nvme 2025-05-07T19:43:00.9527352Z 2025-05-07T19:43:00.9527563Z 00:05.0 Ethernet controller: Amazon.com, Inc. Elastic Network Adapter (ENA) 2025-05-07T19:43:00.9527640Z Physical Slot: 5 2025-05-07T19:43:00.9527742Z Flags: bus master, fast devsel, latency 0 2025-05-07T19:43:00.9527888Z Memory at c0510000 (32-bit, non-prefetchable) [size=16K] 2025-05-07T19:43:00.9528009Z Memory at c0400000 (32-bit, prefetchable) [size=1M] 2025-05-07T19:43:00.9528152Z Memory at c0500000 (32-bit, non-prefetchable) [size=64K] 2025-05-07T19:43:00.9528253Z Capabilities: 2025-05-07T19:43:00.9528337Z Kernel driver in use: ena 2025-05-07T19:43:00.9528341Z 2025-05-07T19:43:00.9528345Z 2025-05-07T19:43:00.9528515Z ################################################################################ 2025-05-07T19:43:00.9528616Z [INFO] Print Linux distribution info ... 2025-05-07T19:43:00.9528702Z + uname -a 2025-05-07T19:43:00.9528707Z 2025-05-07T19:43:00.9529095Z Linux 3634184bf123 6.1.130-139.222.amzn2023.x86_64 #1 SMP PREEMPT_DYNAMIC Tue Mar 11 01:10:58 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux 2025-05-07T19:43:00.9529104Z 2025-05-07T19:43:00.9529176Z + uname -m 2025-05-07T19:43:00.9529190Z 2025-05-07T19:43:00.9529255Z x86_64 2025-05-07T19:43:00.9529259Z 2025-05-07T19:43:00.9529338Z + cat /proc/version 2025-05-07T19:43:00.9529342Z 2025-05-07T19:43:00.9529888Z Linux version 6.1.130-139.222.amzn2023.x86_64 (mockbuild@ip-10-0-55-76) (gcc (GCC) 11.5.0 20240719 (Red Hat 11.5.0-5), GNU ld version 2.39-6.amzn2023.0.11) #1 SMP PREEMPT_DYNAMIC Tue Mar 11 01:10:58 UTC 2025 2025-05-07T19:43:00.9529904Z 2025-05-07T19:43:00.9529982Z + cat /etc/os-release 2025-05-07T19:43:00.9529986Z 2025-05-07T19:43:00.9530062Z NAME="Amazon Linux" 2025-05-07T19:43:00.9530134Z VERSION="2023" 2025-05-07T19:43:00.9530222Z ID="amzn" 2025-05-07T19:43:00.9530297Z ID_LIKE="fedora" 2025-05-07T19:43:00.9530374Z VERSION_ID="2023" 2025-05-07T19:43:00.9530480Z PLATFORM_ID="platform:al2023" 2025-05-07T19:43:00.9530580Z PRETTY_NAME="Amazon Linux 2023.7.20250428" 2025-05-07T19:43:00.9530655Z ANSI_COLOR="0;33" 2025-05-07T19:43:00.9530772Z CPE_NAME="cpe:2.3:o:amazon:amazon_linux:2023" 2025-05-07T19:43:00.9530952Z HOME_URL="https://aws.amazon.com/linux/amazon-linux-2023/" 2025-05-07T19:43:00.9531102Z DOCUMENTATION_URL="https://docs.aws.amazon.com/linux/" 2025-05-07T19:43:00.9531245Z SUPPORT_URL="https://aws.amazon.com/premiumsupport/" 2025-05-07T19:43:00.9531436Z BUG_REPORT_URL="https://github.com/amazonlinux/amazon-linux-2023" 2025-05-07T19:43:00.9531510Z VENDOR_NAME="AWS" 2025-05-07T19:43:00.9531610Z VENDOR_URL="https://aws.amazon.com/" 2025-05-07T19:43:00.9531704Z SUPPORT_END="2029-06-30" 2025-05-07T19:43:00.9531708Z 2025-05-07T19:43:00.9574075Z ##[group]Run . $PRELUDE; print_gpu_info 2025-05-07T19:43:00.9574211Z . $PRELUDE; print_gpu_info 2025-05-07T19:43:00.9574452Z shell: bash --noprofile --norc -e -o pipefail {0} 2025-05-07T19:43:00.9574520Z env: 2025-05-07T19:43:00.9574624Z PRELUDE: .github/scripts/setup_env.bash 2025-05-07T19:43:00.9574719Z BUILD_ENV: build_binary 2025-05-07T19:43:00.9574886Z BUILD_TARGET: genai 2025-05-07T19:43:00.9574960Z BUILD_VARIANT: cuda 2025-05-07T19:43:00.9575050Z BUILD_CUDA_VERSION: 12.8.0 2025-05-07T19:43:00.9575125Z ##[endgroup] 2025-05-07T19:43:01.4027916Z ################################################################################ 2025-05-07T19:43:01.4028350Z [INFO] Printing general display info ... 2025-05-07T19:43:01.4045609Z [EXEC] [ATTEMPT 0/3] + wget -q --timeout 1 pypi.org -O /dev/null 2025-05-07T19:43:01.4884025Z [CHECK] Network does not appear to be blocked. 2025-05-07T19:43:01.4887901Z /usr/bin/sudo 2025-05-07T19:43:01.4898148Z which: no apt-get in (/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin) 2025-05-07T19:43:01.4904871Z /usr/bin/yum 2025-05-07T19:43:01.4905164Z [INSTALL] Updating system repositories ... 2025-05-07T19:43:01.4927268Z [EXEC] [ATTEMPT 0/3] + sudo yum update -y 2025-05-07T19:43:01.7106572Z Last metadata expiration check: 0:00:17 ago on Wed May 7 19:42:44 2025. 2025-05-07T19:43:01.8072669Z Dependencies resolved. 2025-05-07T19:43:01.8289695Z Nothing to do. 2025-05-07T19:43:01.8290068Z Complete! 2025-05-07T19:43:01.8674721Z [INSTALL] Installing system package(s): hostname lshw ... 2025-05-07T19:43:01.8698255Z [EXEC] [ATTEMPT 0/3] + sudo yum install -y hostname lshw 2025-05-07T19:43:02.0835579Z Last metadata expiration check: 0:00:18 ago on Wed May 7 19:42:44 2025. 2025-05-07T19:43:02.1349193Z Dependencies resolved. 2025-05-07T19:43:02.1515010Z ================================================================================ 2025-05-07T19:43:02.1516438Z Package Arch Version Repository Size 2025-05-07T19:43:02.1517740Z ================================================================================ 2025-05-07T19:43:02.1518069Z Installing: 2025-05-07T19:43:02.1518377Z hostname x86_64 3.23-4.amzn2023.0.3 amazonlinux 28 k 2025-05-07T19:43:02.1518859Z lshw x86_64 B.02.19.2-7.amzn2023.0.3 amazonlinux 319 k 2025-05-07T19:43:02.1519147Z 2025-05-07T19:43:02.1519243Z Transaction Summary 2025-05-07T19:43:02.1519512Z ================================================================================ 2025-05-07T19:43:02.1519840Z Install 2 Packages 2025-05-07T19:43:02.1519977Z 2025-05-07T19:43:02.1520077Z Total download size: 347 k 2025-05-07T19:43:02.1520358Z Installed size: 883 k 2025-05-07T19:43:02.1520596Z Downloading Packages: 2025-05-07T19:43:02.4317215Z (1/2): hostname-3.23-4.amzn2023.0.3.x86_64.rpm 1.4 MB/s | 28 kB 00:00 2025-05-07T19:43:02.4416728Z (2/2): lshw-B.02.19.2-7.amzn2023.0.3.x86_64.rpm 11 MB/s | 319 kB 00:00 2025-05-07T19:43:02.4421822Z -------------------------------------------------------------------------------- 2025-05-07T19:43:02.4425207Z Total 1.2 MB/s | 347 kB 00:00 2025-05-07T19:43:02.4646143Z Running transaction check 2025-05-07T19:43:02.4697387Z Transaction check succeeded. 2025-05-07T19:43:02.4697949Z Running transaction test 2025-05-07T19:43:02.4857300Z Transaction test succeeded. 2025-05-07T19:43:02.4858178Z Running transaction 2025-05-07T19:43:02.5185421Z Preparing : 1/1 2025-05-07T19:43:02.5284036Z Installing : lshw-B.02.19.2-7.amzn2023.0.3.x86_64 1/2 2025-05-07T19:43:02.5344711Z Installing : hostname-3.23-4.amzn2023.0.3.x86_64 2/2 2025-05-07T19:43:03.5835665Z Running scriptlet: hostname-3.23-4.amzn2023.0.3.x86_64 2/2 2025-05-07T19:43:03.5837217Z Verifying : hostname-3.23-4.amzn2023.0.3.x86_64 1/2 2025-05-07T19:43:03.6208576Z Verifying : lshw-B.02.19.2-7.amzn2023.0.3.x86_64 2/2 2025-05-07T19:43:03.6210038Z 2025-05-07T19:43:03.6210316Z Installed: 2025-05-07T19:43:03.6211289Z hostname-3.23-4.amzn2023.0.3.x86_64 lshw-B.02.19.2-7.amzn2023.0.3.x86_64 2025-05-07T19:43:03.6212282Z 2025-05-07T19:43:03.6212520Z Complete! 2025-05-07T19:43:03.6553064Z + hostname 2025-05-07T19:43:03.6553503Z 2025-05-07T19:43:03.6559923Z 3634184bf123 2025-05-07T19:43:03.6561220Z 2025-05-07T19:43:03.6561795Z + sudo lshw -C display 2025-05-07T19:43:03.6562320Z 2025-05-07T19:43:03.8521035Z *-display UNCLAIMED 2025-05-07T19:43:03.8521577Z description: VGA compatible controller 2025-05-07T19:43:03.8521936Z product: Amazon.com, Inc. 2025-05-07T19:43:03.8522239Z vendor: Amazon.com, Inc. 2025-05-07T19:43:03.8522501Z physical id: 3 2025-05-07T19:43:03.8522756Z bus info: pci@0000:00:03.0 2025-05-07T19:43:03.8523019Z version: 00 2025-05-07T19:43:03.8523395Z width: 32 bits 2025-05-07T19:43:03.8523610Z clock: 33MHz 2025-05-07T19:43:03.8523872Z capabilities: vga_controller bus_master 2025-05-07T19:43:03.8524202Z configuration: latency=0 2025-05-07T19:43:03.8524515Z resources: memory:c0000000-c03fffff memory:c0000-dffff 2025-05-07T19:43:03.8543680Z 2025-05-07T19:43:03.8544391Z ################################################################################ 2025-05-07T19:43:03.8545522Z [INFO] Printing NVIDIA GPU info ... 2025-05-07T19:43:03.8649406Z lspci: Unable to load libkmod resources: error -2 2025-05-07T19:43:03.8670118Z which: no nvidia-smi in (/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin) 2025-05-07T19:43:03.8671518Z [CHECK] nvidia-smi not found 2025-05-07T19:43:03.8672386Z ################################################################################ 2025-05-07T19:43:03.8673347Z [INFO] Printing AMD GPU info ... 2025-05-07T19:43:03.8810863Z lspci: Unable to load libkmod resources: error -2 2025-05-07T19:43:03.8846187Z which: no rocminfo in (/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin) 2025-05-07T19:43:03.8846769Z [CHECK] rocminfo not found 2025-05-07T19:43:03.8853465Z which: no rocm-smi in (/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin) 2025-05-07T19:43:03.8853995Z [CHECK] rocm-smi not found 2025-05-07T19:43:03.8928068Z ##[group]Run . $PRELUDE; setup_miniconda $HOME/miniconda 2025-05-07T19:43:03.8928584Z . $PRELUDE; setup_miniconda $HOME/miniconda 2025-05-07T19:43:03.8929180Z shell: bash --noprofile --norc -e -o pipefail {0} 2025-05-07T19:43:03.8929546Z env: 2025-05-07T19:43:03.8929788Z PRELUDE: .github/scripts/setup_env.bash 2025-05-07T19:43:03.8930141Z BUILD_ENV: build_binary 2025-05-07T19:43:03.8930410Z BUILD_TARGET: genai 2025-05-07T19:43:03.8930688Z BUILD_VARIANT: cuda 2025-05-07T19:43:03.8930961Z BUILD_CUDA_VERSION: 12.8.0 2025-05-07T19:43:03.8931225Z ##[endgroup] 2025-05-07T19:43:04.3758781Z ################################################################################ 2025-05-07T19:43:04.3759229Z # Setup Miniconda 2025-05-07T19:43:04.3759456Z # 2025-05-07T19:43:04.3779864Z # [2025-05-07T19:43:04.377Z] + setup_miniconda /github/home/miniconda 2025-05-07T19:43:04.3780572Z ################################################################################ 2025-05-07T19:43:04.3781011Z 2025-05-07T19:43:04.3792712Z [EXEC] [ATTEMPT 0/3] + wget -q --timeout 1 pypi.org -O /dev/null 2025-05-07T19:43:04.4678914Z [CHECK] Network does not appear to be blocked. 2025-05-07T19:43:04.4680764Z + mkdir -p /github/home/miniconda 2025-05-07T19:43:04.4681794Z 2025-05-07T19:43:04.4693658Z 2025-05-07T19:43:04.4695045Z [SETUP] Downloading the Miniconda installer ... 2025-05-07T19:43:04.4712270Z [EXEC] [ATTEMPT 0/3] + wget -q https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh 2025-05-07T19:43:05.2742948Z [SETUP] Installing Miniconda ... 2025-05-07T19:43:05.2743572Z + bash miniconda.sh -b -p /github/home/miniconda -u 2025-05-07T19:43:05.2743884Z 2025-05-07T19:43:05.2888056Z PREFIX=/github/home/miniconda 2025-05-07T19:43:05.6496263Z Unpacking payload ... 2025-05-07T19:43:06.1334756Z entry_point.py:256: DeprecationWarning: Python 3.14 will, by default, filter extracted tar archives and reject files or modify their metadata. Use the filter argument to control this behavior. 2025-05-07T19:43:06.8086595Z entry_point.py:256: DeprecationWarning: Python 3.14 will, by default, filter extracted tar archives and reject files or modify their metadata. Use the filter argument to control this behavior. 2025-05-07T19:43:08.6988015Z 2025-05-07T19:43:08.6988926Z Installing base environment... 2025-05-07T19:43:08.6989621Z 2025-05-07T19:43:09.6954718Z Preparing transaction: ...working... done 2025-05-07T19:43:12.5727969Z Executing transaction: ...working... done 2025-05-07T19:43:13.1238413Z entry_point.py:256: DeprecationWarning: Python 3.14 will, by default, filter extracted tar archives and reject files or modify their metadata. Use the filter argument to control this behavior. 2025-05-07T19:43:13.1932155Z installation finished. 2025-05-07T19:43:13.1933407Z 2025-05-07T19:43:13.1933950Z + rm -f miniconda.sh 2025-05-07T19:43:13.1934376Z 2025-05-07T19:43:13.2066244Z 2025-05-07T19:43:13.2066904Z [SETUP] Reloading the bash configuration ... 2025-05-07T19:43:13.2068047Z + /github/home/miniconda/bin/conda init bash 2025-05-07T19:43:13.2068719Z 2025-05-07T19:43:13.5700717Z no change /github/home/miniconda/condabin/conda 2025-05-07T19:43:13.5701871Z no change /github/home/miniconda/bin/conda 2025-05-07T19:43:13.5702942Z no change /github/home/miniconda/bin/conda-env 2025-05-07T19:43:13.5704231Z no change /github/home/miniconda/bin/activate 2025-05-07T19:43:13.5705299Z no change /github/home/miniconda/bin/deactivate 2025-05-07T19:43:13.5706532Z no change /github/home/miniconda/etc/profile.d/conda.sh 2025-05-07T19:43:13.5706992Z no change /github/home/miniconda/etc/fish/conf.d/conda.fish 2025-05-07T19:43:13.5707496Z no change /github/home/miniconda/shell/condabin/Conda.psm1 2025-05-07T19:43:13.5707985Z no change /github/home/miniconda/shell/condabin/conda-hook.ps1 2025-05-07T19:43:13.5708584Z no change /github/home/miniconda/lib/python3.13/site-packages/xontrib/conda.xsh 2025-05-07T19:43:13.5709712Z no change /github/home/miniconda/etc/profile.d/conda.csh 2025-05-07T19:43:13.5710153Z modified /github/home/.bashrc 2025-05-07T19:43:13.5710371Z 2025-05-07T19:43:13.5710634Z ==> For changes to take effect, close and re-open your current shell. <== 2025-05-07T19:43:13.5710967Z 2025-05-07T19:43:13.6243895Z 2025-05-07T19:43:13.6244527Z + . /github/home/.bashrc 2025-05-07T19:43:13.6245102Z 2025-05-07T19:43:14.4103619Z 2025-05-07T19:43:14.4104676Z [SETUP] Installing libmamba-solver (required since Anaconda 2024.02-1) and libarchive ... 2025-05-07T19:43:14.4132477Z [EXEC] [ATTEMPT 0/3] + conda install --solver=classic -c conda-forge --override-channels -y conda-libmamba-solver libmamba libmambapy libarchive 2025-05-07T19:43:26.1754507Z Collecting package metadata (current_repodata.json): - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - done 2025-05-07T19:43:27.6382374Z Solving environment: | / - \ | / - \ | / - done 2025-05-07T19:43:27.7269393Z 2025-05-07T19:43:27.7269935Z ## Package Plan ## 2025-05-07T19:43:27.7270483Z 2025-05-07T19:43:27.7270876Z environment location: /github/home/miniconda 2025-05-07T19:43:27.7271615Z 2025-05-07T19:43:27.7271892Z added / updated specs: 2025-05-07T19:43:27.7272663Z - conda-libmamba-solver 2025-05-07T19:43:27.7273386Z - libarchive 2025-05-07T19:43:27.7274011Z - libmamba 2025-05-07T19:43:27.7274568Z - libmambapy 2025-05-07T19:43:27.7274927Z 2025-05-07T19:43:27.7274939Z 2025-05-07T19:43:27.7275304Z The following packages will be downloaded: 2025-05-07T19:43:27.7275943Z 2025-05-07T19:43:27.7276271Z package | build 2025-05-07T19:43:27.7276849Z ---------------------------|----------------- 2025-05-07T19:43:27.7277295Z ca-certificates-2025.4.26 | hbd8a1cb_0 149 KB conda-forge 2025-05-07T19:43:27.7278111Z certifi-2025.4.26 | pyhd8ed1ab_0 154 KB conda-forge 2025-05-07T19:43:27.7278581Z conda-25.3.1 | py313h78bf25f_1 1.1 MB conda-forge 2025-05-07T19:43:27.7279076Z conda-libmamba-solver-25.4.0| pyhd8ed1ab_0 41 KB conda-forge 2025-05-07T19:43:27.7279570Z ------------------------------------------------------------ 2025-05-07T19:43:27.7279926Z Total: 1.4 MB 2025-05-07T19:43:27.7280167Z 2025-05-07T19:43:27.7280285Z The following packages will be UPDATED: 2025-05-07T19:43:27.7280499Z 2025-05-07T19:43:27.7284911Z ca-certificates pkgs/main/linux-64::ca-certificates-2~ --> conda-forge/noarch::ca-certificates-2025.4.26-hbd8a1cb_0 2025-05-07T19:43:27.7285740Z conda pkgs/main::conda-25.3.1-py313h06a4308~ --> conda-forge::conda-25.3.1-py313h78bf25f_1 2025-05-07T19:43:27.7286171Z 2025-05-07T19:43:27.7286397Z The following packages will be SUPERSEDED by a higher-priority channel: 2025-05-07T19:43:27.7286737Z 2025-05-07T19:43:27.7287089Z certifi pkgs/main/linux-64::certifi-2025.4.26~ --> conda-forge/noarch::certifi-2025.4.26-pyhd8ed1ab_0 2025-05-07T19:43:27.7287916Z conda-libmamba-so~ pkgs/main::conda-libmamba-solver-25.4~ --> conda-forge::conda-libmamba-solver-25.4.0-pyhd8ed1ab_0 2025-05-07T19:43:27.7288430Z 2025-05-07T19:43:27.7288434Z 2025-05-07T19:43:27.7288437Z 2025-05-07T19:43:27.7288582Z Downloading and Extracting Packages: ...working... 2025-05-07T19:43:27.7288999Z conda-25.3.1 | 1.1 MB | | 0% 2025-05-07T19:43:27.7289232Z 2025-05-07T19:43:27.7289545Z certifi-2025.4.26 | 154 KB | | 0%  2025-05-07T19:43:27.7289790Z 2025-05-07T19:43:27.7289794Z 2025-05-07T19:43:27.7292669Z ca-certificates-2025 | 149 KB | | 0%  2025-05-07T19:43:27.7292946Z 2025-05-07T19:43:27.7292950Z 2025-05-07T19:43:27.7293359Z 2025-05-07T19:43:27.7813176Z conda-libmamba-solve | 41 KB | | 0%  2025-05-07T19:43:27.7813537Z 2025-05-07T19:43:27.7813541Z 2025-05-07T19:43:27.7813545Z 2025-05-07T19:43:27.7947952Z conda-libmamba-solve | 41 KB | ########## | 100%  2025-05-07T19:43:27.7948296Z 2025-05-07T19:43:27.7948300Z 2025-05-07T19:43:27.7948304Z 2025-05-07T19:43:27.7949587Z conda-libmamba-solve | 41 KB | ########## | 100%  2025-05-07T19:43:27.7949870Z 2025-05-07T19:43:27.7949927Z 2025-05-07T19:43:27.7971756Z ca-certificates-2025 | 149 KB | ########## | 100%  2025-05-07T19:43:27.7972633Z 2025-05-07T19:43:27.8041583Z certifi-2025.4.26 | 154 KB | ########## | 100%  2025-05-07T19:43:27.8079754Z conda-25.3.1 | 1.1 MB | ########## | 100% 2025-05-07T19:43:27.8080546Z 2025-05-07T19:43:27.8080587Z 2025-05-07T19:43:27.8132585Z ca-certificates-2025 | 149 KB | ########## | 100%  2025-05-07T19:43:27.8133458Z 2025-05-07T19:43:27.9013599Z certifi-2025.4.26 | 154 KB | ########## | 100%  2025-05-07T19:43:27.9014799Z conda-25.3.1 | 1.1 MB | ########## | 100% 2025-05-07T19:43:27.9017261Z conda-25.3.1 | 1.1 MB | ########## | 100% 2025-05-07T19:43:27.9017977Z 2025-05-07T19:43:27.9018213Z 2025-05-07T19:43:27.9018417Z  2025-05-07T19:43:27.9018634Z 2025-05-07T19:43:27.9018638Z 2025-05-07T19:43:27.9018841Z  2025-05-07T19:43:27.9019060Z 2025-05-07T19:43:27.9019064Z 2025-05-07T19:43:27.9019069Z 2025-05-07T19:43:27.9019261Z  done 2025-05-07T19:43:28.0027360Z Preparing transaction: | done 2025-05-07T19:43:28.1035047Z Verifying transaction: - done 2025-05-07T19:43:29.4058811Z Executing transaction: | / - \ | / - \ | / - \ | done 2025-05-07T19:43:30.9898140Z [SETUP] Updating Miniconda base packages ... 2025-05-07T19:43:30.9924724Z [EXEC] [ATTEMPT 0/3] + conda update -n base -c defaults --update-deps -y conda 2025-05-07T19:43:31.7239117Z Channels: 2025-05-07T19:43:31.7239448Z - defaults 2025-05-07T19:43:31.7239679Z Platform: linux-64 2025-05-07T19:43:32.8017546Z Collecting package metadata (repodata.json): - \ | / - \ done 2025-05-07T19:43:32.9314402Z Solving environment: / - Channels: 2025-05-07T19:43:32.9314878Z - defaults 2025-05-07T19:43:32.9315097Z Platform: linux-64 2025-05-07T19:43:33.2148804Z Collecting package metadata (repodata.json): | / - \ done 2025-05-07T19:43:33.4231393Z Solving environment: / - \ | done 2025-05-07T19:43:33.5235907Z done 2025-05-07T19:43:33.5868196Z 2025-05-07T19:43:33.5868753Z ## Package Plan ## 2025-05-07T19:43:33.5868949Z 2025-05-07T19:43:33.5869088Z environment location: /github/home/miniconda 2025-05-07T19:43:33.5869344Z 2025-05-07T19:43:33.5869461Z added / updated specs: 2025-05-07T19:43:33.5869768Z - conda 2025-05-07T19:43:33.5869930Z 2025-05-07T19:43:33.5869934Z 2025-05-07T19:43:33.5870054Z The following packages will be downloaded: 2025-05-07T19:43:33.5870278Z 2025-05-07T19:43:33.5870413Z package | build 2025-05-07T19:43:33.5870745Z ---------------------------|----------------- 2025-05-07T19:43:33.5871118Z pip-25.1 | pyhc872135_2 1.3 MB 2025-05-07T19:43:33.5871518Z tzdata-2025b | h04d1e81_0 116 KB 2025-05-07T19:43:33.5871912Z ------------------------------------------------------------ 2025-05-07T19:43:33.5872256Z Total: 1.4 MB 2025-05-07T19:43:33.5872488Z 2025-05-07T19:43:33.5872605Z The following packages will be UPDATED: 2025-05-07T19:43:33.5872816Z 2025-05-07T19:43:33.5873166Z pip pkgs/main/linux-64::pip-25.0-py313h06~ --> pkgs/main/noarch::pip-25.1-pyhc872135_2 2025-05-07T19:43:33.5874017Z tzdata 2025a-h04d1e81_0 --> 2025b-h04d1e81_0 2025-05-07T19:43:33.5874311Z 2025-05-07T19:43:33.5874315Z 2025-05-07T19:43:33.5874318Z 2025-05-07T19:43:33.5874476Z Downloading and Extracting Packages: ...working... 2025-05-07T19:43:33.5874847Z pip-25.1 | 1.3 MB | | 0% 2025-05-07T19:43:33.5875084Z 2025-05-07T19:43:33.6224788Z tzdata-2025b | 116 KB | | 0%  2025-05-07T19:43:33.6225079Z 2025-05-07T19:43:33.6487996Z tzdata-2025b | 116 KB | ########## | 100%  2025-05-07T19:43:33.8923279Z pip-25.1 | 1.3 MB | ########## | 100% 2025-05-07T19:43:33.8923646Z 2025-05-07T19:43:33.8924090Z tzdata-2025b | 116 KB | ########## | 100%  2025-05-07T19:43:33.8924355Z 2025-05-07T19:43:33.8981622Z tzdata-2025b | 116 KB | ########## | 100%  2025-05-07T19:43:33.8982044Z pip-25.1 | 1.3 MB | ########## | 100% 2025-05-07T19:43:33.8982859Z pip-25.1 | 1.3 MB | ########## | 100% 2025-05-07T19:43:33.8983204Z 2025-05-07T19:43:33.8983561Z 2025-05-07T19:43:33.8984291Z  done 2025-05-07T19:43:33.9993998Z Preparing transaction: - done 2025-05-07T19:43:34.0999486Z Verifying transaction: | done 2025-05-07T19:43:36.1035477Z Executing transaction: - \ | / - \ | / - \ | / - \ | / - \ | / done 2025-05-07T19:43:36.6517514Z [SETUP] Cleaning up Conda packages ... 2025-05-07T19:43:36.6518349Z + conda clean --packages --tarball -y 2025-05-07T19:43:36.6518569Z 2025-05-07T19:43:37.0849346Z Will remove 99 (117.8 MB) tarball(s). 2025-05-07T19:43:37.0850329Z Will remove 11 (16.0 MB) package(s). 2025-05-07T19:43:37.1413304Z 2025-05-07T19:43:37.1417059Z + conda clean --all -y 2025-05-07T19:43:37.1417606Z 2025-05-07T19:43:37.5859302Z There are no unused tarball(s) to remove. 2025-05-07T19:43:37.5860300Z Will remove 1 index cache(s). 2025-05-07T19:43:37.5861200Z There are no unused package(s) to remove. 2025-05-07T19:43:37.5862739Z There are no tempfile(s) to remove. 2025-05-07T19:43:37.5863827Z There are no logfile(s) to remove. 2025-05-07T19:43:37.6395145Z 2025-05-07T19:43:37.6396437Z + conda info 2025-05-07T19:43:37.6396592Z 2025-05-07T19:43:38.2024864Z 2025-05-07T19:43:38.2025412Z active environment : base 2025-05-07T19:43:38.2025766Z active env location : /github/home/miniconda 2025-05-07T19:43:38.2026118Z shell level : 1 2025-05-07T19:43:38.2026411Z user config file : /github/home/.condarc 2025-05-07T19:43:38.2026861Z populated config files : /github/home/miniconda/.condarc 2025-05-07T19:43:38.2027260Z conda version : 25.3.1 2025-05-07T19:43:38.2027555Z conda-build version : not installed 2025-05-07T19:43:38.2028008Z python version : 3.13.2.final.0 2025-05-07T19:43:38.2028318Z solver : libmamba (default) 2025-05-07T19:43:38.2028682Z virtual packages : __archspec=1=cascadelake 2025-05-07T19:43:38.2029116Z __conda=25.3.1=0 2025-05-07T19:43:38.2029417Z __glibc=2.34=0 2025-05-07T19:43:38.2029705Z __linux=6.1.130=0 2025-05-07T19:43:38.2029993Z __unix=0=0 2025-05-07T19:43:38.2030359Z base environment : /github/home/miniconda (writable) 2025-05-07T19:43:38.2030757Z conda av data dir : /github/home/miniconda/etc/conda 2025-05-07T19:43:38.2031133Z conda av metadata url : None 2025-05-07T19:43:38.2031504Z channel URLs : https://repo.anaconda.com/pkgs/main/linux-64 2025-05-07T19:43:38.2031951Z https://repo.anaconda.com/pkgs/main/noarch 2025-05-07T19:43:38.2032341Z https://repo.anaconda.com/pkgs/r/linux-64 2025-05-07T19:43:38.2032751Z https://repo.anaconda.com/pkgs/r/noarch 2025-05-07T19:43:38.2033147Z package cache : /github/home/miniconda/pkgs 2025-05-07T19:43:38.2033806Z /github/home/.conda/pkgs 2025-05-07T19:43:38.2034209Z envs directories : /github/home/miniconda/envs 2025-05-07T19:43:38.2034560Z /github/home/.conda/envs 2025-05-07T19:43:38.2034914Z platform : linux-64 2025-05-07T19:43:38.2035775Z user-agent : conda/25.3.1 requests/2.32.3 CPython/3.13.2 Linux/6.1.130-139.222.amzn2023.x86_64 amzn/2023.7.20250428 glibc/2.34 solver/libmamba conda-libmamba-solver/25.4.0 libmambapy/2.0.5 aau/0.7.0 c/. s/. e/. 2025-05-07T19:43:38.2036666Z UID:GID : 0:0 2025-05-07T19:43:38.2036976Z netrc file : None 2025-05-07T19:43:38.2037254Z offline mode : False 2025-05-07T19:43:38.2037437Z 2025-05-07T19:43:38.2619133Z 2025-05-07T19:43:38.2619791Z [SETUP] Exporting Miniconda variables ... 2025-05-07T19:43:38.2621664Z [SETUP] Saving Miniconda variables to /__w/_temp/_runner_file_commands/add_path_816a531e-2619-4447-82f8-e43a8d85b45a ... 2025-05-07T19:43:38.2623927Z [SETUP] Successfully set up Miniconda at /github/home/miniconda 2025-05-07T19:43:38.2790652Z ##[group]Run . $PRELUDE; create_conda_environment $BUILD_ENV 3.12 2025-05-07T19:43:38.2791196Z . $PRELUDE; create_conda_environment $BUILD_ENV 3.12 2025-05-07T19:43:38.2791918Z shell: bash --noprofile --norc -e -o pipefail {0} 2025-05-07T19:43:38.2792261Z env: 2025-05-07T19:43:38.2792499Z PRELUDE: .github/scripts/setup_env.bash 2025-05-07T19:43:38.2792807Z BUILD_ENV: build_binary 2025-05-07T19:43:38.2793065Z BUILD_TARGET: genai 2025-05-07T19:43:38.2793290Z BUILD_VARIANT: cuda 2025-05-07T19:43:38.2793537Z BUILD_CUDA_VERSION: 12.8.0 2025-05-07T19:43:38.2793785Z ##[endgroup] 2025-05-07T19:43:38.7044262Z ################################################################################ 2025-05-07T19:43:38.7045377Z # Create Conda Environment 2025-05-07T19:43:38.7046066Z # 2025-05-07T19:43:38.7067316Z # [2025-05-07T19:43:38.706Z] + create_conda_environment build_binary 3.12 2025-05-07T19:43:38.7067952Z ################################################################################ 2025-05-07T19:43:38.7068588Z 2025-05-07T19:43:38.7094173Z [EXEC] [ATTEMPT 0/3] + wget -q --timeout 1 pypi.org -O /dev/null 2025-05-07T19:43:38.7917265Z [CHECK] Network does not appear to be blocked. 2025-05-07T19:43:38.7918415Z [SETUP] Listing existing Conda environments ... 2025-05-07T19:43:38.7919421Z + conda info --envs 2025-05-07T19:43:38.7919829Z 2025-05-07T19:43:39.3497820Z 2025-05-07T19:43:39.3498169Z # conda environments: 2025-05-07T19:43:39.3499204Z # 2025-05-07T19:43:39.3499525Z base /github/home/miniconda 2025-05-07T19:43:39.3499803Z 2025-05-07T19:43:39.4092308Z 2025-05-07T19:43:39.4093081Z [SETUP] Deleting the prefix directory if it exists ... 2025-05-07T19:43:41.0641020Z + rm -rf /github/home/miniconda/envs/build_binary 2025-05-07T19:43:41.0641426Z 2025-05-07T19:43:41.0655934Z 2025-05-07T19:43:41.0668817Z [SETUP] Creating new Conda environment (Python 3.12) ... 2025-05-07T19:43:41.0694345Z [EXEC] [ATTEMPT 0/3] + conda create -y -n build_binary python=3.12 2025-05-07T19:43:41.6437200Z Channels: 2025-05-07T19:43:41.6437869Z - defaults 2025-05-07T19:43:41.6438517Z Platform: linux-64 2025-05-07T19:43:43.0113662Z Collecting package metadata (repodata.json): - \ | / - \ | / done 2025-05-07T19:43:43.1122572Z Solving environment: \ done 2025-05-07T19:43:43.1413956Z 2025-05-07T19:43:43.1414373Z ## Package Plan ## 2025-05-07T19:43:43.1414569Z 2025-05-07T19:43:43.1414878Z environment location: /github/home/miniconda/envs/build_binary 2025-05-07T19:43:43.1415206Z 2025-05-07T19:43:43.1415329Z added / updated specs: 2025-05-07T19:43:43.1415635Z - python=3.12 2025-05-07T19:43:43.1415786Z 2025-05-07T19:43:43.1415791Z 2025-05-07T19:43:43.1415952Z The following packages will be downloaded: 2025-05-07T19:43:43.1416185Z 2025-05-07T19:43:43.1416317Z package | build 2025-05-07T19:43:43.1416695Z ---------------------------|----------------- 2025-05-07T19:43:43.1417123Z _libgcc_mutex-0.1 | main 3 KB 2025-05-07T19:43:43.1417602Z _openmp_mutex-5.1 | 1_gnu 21 KB 2025-05-07T19:43:43.1418053Z ca-certificates-2025.2.25 | h06a4308_0 129 KB 2025-05-07T19:43:43.1418522Z python-3.12.9 | h5148396_0 34.7 MB 2025-05-07T19:43:43.1418977Z setuptools-78.1.1 | py312h06a4308_0 2.2 MB 2025-05-07T19:43:43.1419403Z wheel-0.45.1 | py312h06a4308_0 147 KB 2025-05-07T19:43:43.1419824Z ------------------------------------------------------------ 2025-05-07T19:43:43.1420188Z Total: 37.2 MB 2025-05-07T19:43:43.1420441Z 2025-05-07T19:43:43.1420582Z The following NEW packages will be INSTALLED: 2025-05-07T19:43:43.1420822Z 2025-05-07T19:43:43.1421080Z _libgcc_mutex pkgs/main/linux-64::_libgcc_mutex-0.1-main 2025-05-07T19:43:43.1421703Z _openmp_mutex pkgs/main/linux-64::_openmp_mutex-5.1-1_gnu 2025-05-07T19:43:43.1422511Z bzip2 pkgs/main/linux-64::bzip2-1.0.8-h5eee18b_6 2025-05-07T19:43:43.1423032Z ca-certificates pkgs/main/linux-64::ca-certificates-2025.2.25-h06a4308_0 2025-05-07T19:43:43.1423692Z expat pkgs/main/linux-64::expat-2.7.1-h6a678d5_0 2025-05-07T19:43:43.1424429Z ld_impl_linux-64 pkgs/main/linux-64::ld_impl_linux-64-2.40-h12ee557_0 2025-05-07T19:43:43.1424957Z libffi pkgs/main/linux-64::libffi-3.4.4-h6a678d5_1 2025-05-07T19:43:43.1425453Z libgcc-ng pkgs/main/linux-64::libgcc-ng-11.2.0-h1234567_1 2025-05-07T19:43:43.1425916Z libgomp pkgs/main/linux-64::libgomp-11.2.0-h1234567_1 2025-05-07T19:43:43.1426436Z libstdcxx-ng pkgs/main/linux-64::libstdcxx-ng-11.2.0-h1234567_1 2025-05-07T19:43:43.1426926Z libuuid pkgs/main/linux-64::libuuid-1.41.5-h5eee18b_0 2025-05-07T19:43:43.1427399Z ncurses pkgs/main/linux-64::ncurses-6.4-h6a678d5_0 2025-05-07T19:43:43.1427892Z openssl pkgs/main/linux-64::openssl-3.0.16-h5eee18b_0 2025-05-07T19:43:43.1428521Z pip pkgs/main/noarch::pip-25.1-pyhc872135_2 2025-05-07T19:43:43.1428971Z python pkgs/main/linux-64::python-3.12.9-h5148396_0 2025-05-07T19:43:43.1429432Z readline pkgs/main/linux-64::readline-8.2-h5eee18b_0 2025-05-07T19:43:43.1430077Z setuptools pkgs/main/linux-64::setuptools-78.1.1-py312h06a4308_0 2025-05-07T19:43:43.1430563Z sqlite pkgs/main/linux-64::sqlite-3.45.3-h5eee18b_0 2025-05-07T19:43:43.1430950Z tk pkgs/main/linux-64::tk-8.6.14-h39e8969_0 2025-05-07T19:43:43.1431352Z tzdata pkgs/main/noarch::tzdata-2025b-h04d1e81_0 2025-05-07T19:43:43.1431769Z wheel pkgs/main/linux-64::wheel-0.45.1-py312h06a4308_0 2025-05-07T19:43:43.1432191Z xz pkgs/main/linux-64::xz-5.6.4-h5eee18b_1 2025-05-07T19:43:43.1432598Z zlib pkgs/main/linux-64::zlib-1.2.13-h5eee18b_1 2025-05-07T19:43:43.1432850Z 2025-05-07T19:43:43.1432857Z 2025-05-07T19:43:43.1432861Z 2025-05-07T19:43:43.1433009Z Downloading and Extracting Packages: ...working... 2025-05-07T19:43:43.1433434Z python-3.12.9 | 34.7 MB | | 0% 2025-05-07T19:43:43.1433855Z 2025-05-07T19:43:43.1442675Z setuptools-78.1.1 | 2.2 MB | | 0%  2025-05-07T19:43:43.1443025Z 2025-05-07T19:43:43.1443030Z 2025-05-07T19:43:43.1449380Z wheel-0.45.1 | 147 KB | | 0%  2025-05-07T19:43:43.1450155Z 2025-05-07T19:43:43.1450167Z 2025-05-07T19:43:43.1450178Z 2025-05-07T19:43:43.1460823Z ca-certificates-2025 | 129 KB | | 0%  2025-05-07T19:43:43.1461717Z 2025-05-07T19:43:43.1461732Z 2025-05-07T19:43:43.1461743Z 2025-05-07T19:43:43.1461771Z 2025-05-07T19:43:43.1482955Z _openmp_mutex-5.1 | 21 KB | | 0%  2025-05-07T19:43:43.1483856Z 2025-05-07T19:43:43.1483872Z 2025-05-07T19:43:43.1483884Z 2025-05-07T19:43:43.1483927Z 2025-05-07T19:43:43.1483952Z 2025-05-07T19:43:43.1901503Z _libgcc_mutex-0.1 | 3 KB | | 0%  2025-05-07T19:43:43.1902399Z 2025-05-07T19:43:43.1902415Z 2025-05-07T19:43:43.1955746Z wheel-0.45.1 | 147 KB | ########## | 100%  2025-05-07T19:43:43.1956597Z 2025-05-07T19:43:43.1956611Z 2025-05-07T19:43:43.1956622Z 2025-05-07T19:43:43.2058797Z ca-certificates-2025 | 129 KB | ########## | 100%  2025-05-07T19:43:43.2059703Z 2025-05-07T19:43:43.2059752Z 2025-05-07T19:43:43.2059764Z 2025-05-07T19:43:43.2059775Z 2025-05-07T19:43:43.2130343Z _openmp_mutex-5.1 | 21 KB | ########## | 100%  2025-05-07T19:43:43.2131237Z 2025-05-07T19:43:43.2131251Z 2025-05-07T19:43:43.2131262Z 2025-05-07T19:43:43.2138586Z ca-certificates-2025 | 129 KB | ########## | 100%  2025-05-07T19:43:43.2138889Z 2025-05-07T19:43:43.2138902Z 2025-05-07T19:43:43.2138905Z 2025-05-07T19:43:43.2138909Z 2025-05-07T19:43:43.2140199Z 2025-05-07T19:43:43.2306219Z _libgcc_mutex-0.1 | 3 KB | ########## | 100%  2025-05-07T19:43:43.2306630Z 2025-05-07T19:43:43.2306634Z 2025-05-07T19:43:43.2306638Z 2025-05-07T19:43:43.2306642Z 2025-05-07T19:43:43.2306646Z 2025-05-07T19:43:43.2417663Z _libgcc_mutex-0.1 | 3 KB | ########## | 100%  2025-05-07T19:43:43.2424944Z python-3.12.9 | 34.7 MB | 3 | 3% 2025-05-07T19:43:43.2425235Z 2025-05-07T19:43:43.2688089Z setuptools-78.1.1 | 2.2 MB | ########1 | 82%  2025-05-07T19:43:43.2688539Z 2025-05-07T19:43:43.2688544Z 2025-05-07T19:43:43.2688830Z wheel-0.45.1 | 147 KB | ########## | 100%  2025-05-07T19:43:43.2689254Z 2025-05-07T19:43:43.2689263Z 2025-05-07T19:43:43.2713761Z wheel-0.45.1 | 147 KB | ########## | 100%  2025-05-07T19:43:43.2714106Z 2025-05-07T19:43:43.2793360Z setuptools-78.1.1 | 2.2 MB | ########## | 100%  2025-05-07T19:43:43.2793782Z 2025-05-07T19:43:43.2793788Z 2025-05-07T19:43:43.2793974Z 2025-05-07T19:43:43.2793987Z 2025-05-07T19:43:43.2796703Z _openmp_mutex-5.1 | 21 KB | ########## | 100%  2025-05-07T19:43:43.2797163Z 2025-05-07T19:43:43.2797170Z 2025-05-07T19:43:43.2797177Z 2025-05-07T19:43:43.2798388Z 2025-05-07T19:43:43.3417127Z _openmp_mutex-5.1 | 21 KB | ########## | 100%  2025-05-07T19:43:43.4418617Z python-3.12.9 | 34.7 MB | ###7 | 37% 2025-05-07T19:43:43.5247781Z python-3.12.9 | 34.7 MB | ########7 | 88% 2025-05-07T19:43:43.5248093Z 2025-05-07T19:43:43.5743672Z setuptools-78.1.1 | 2.2 MB | ########## | 100%  2025-05-07T19:43:44.1115062Z python-3.12.9 | 34.7 MB | ########## | 100% 2025-05-07T19:43:44.1116997Z python-3.12.9 | 34.7 MB | ########## | 100% 2025-05-07T19:43:44.1117478Z 2025-05-07T19:43:44.1117704Z 2025-05-07T19:43:44.1117964Z  2025-05-07T19:43:44.1118302Z 2025-05-07T19:43:44.1118326Z 2025-05-07T19:43:44.1118516Z  2025-05-07T19:43:44.1118782Z 2025-05-07T19:43:44.1118785Z 2025-05-07T19:43:44.1118789Z 2025-05-07T19:43:44.1119007Z  2025-05-07T19:43:44.1119233Z 2025-05-07T19:43:44.1119239Z 2025-05-07T19:43:44.1119245Z 2025-05-07T19:43:44.1119250Z 2025-05-07T19:43:44.1119493Z  2025-05-07T19:43:44.1119771Z 2025-05-07T19:43:44.1119775Z 2025-05-07T19:43:44.1119779Z 2025-05-07T19:43:44.1119782Z 2025-05-07T19:43:44.1119786Z 2025-05-07T19:43:44.1119988Z  done 2025-05-07T19:43:44.3231556Z Preparing transaction: / - done 2025-05-07T19:43:45.8557259Z Verifying transaction: | / - \ | / - \ | / - \ | / done 2025-05-07T19:43:48.0661784Z Executing transaction: \ | / - \ | / - \ | / - \ | / - \ | / - \ | done 2025-05-07T19:43:48.0698802Z # 2025-05-07T19:43:48.0699492Z # To activate this environment, use 2025-05-07T19:43:48.0700329Z # 2025-05-07T19:43:48.0700878Z # $ conda activate build_binary 2025-05-07T19:43:48.0701650Z # 2025-05-07T19:43:48.0702245Z # To deactivate an active environment, use 2025-05-07T19:43:48.0703087Z # 2025-05-07T19:43:48.0703887Z # $ conda deactivate 2025-05-07T19:43:48.0704342Z 2025-05-07T19:43:48.1532271Z [SETUP] Upgrading PIP to latest ... 2025-05-07T19:43:48.1557239Z [EXEC] [ATTEMPT 0/3] + conda run -n build_binary pip install --upgrade pip 2025-05-07T19:43:51.1450996Z WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager, possibly rendering your system unusable. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv. Use the --root-user-action option if you know what you are doing and want to suppress this warning. 2025-05-07T19:43:51.1452852Z 2025-05-07T19:43:51.1453302Z Requirement already satisfied: pip in /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages (25.1) 2025-05-07T19:43:51.1453908Z Collecting pip 2025-05-07T19:43:51.1454238Z Downloading pip-25.1.1-py3-none-any.whl.metadata (3.6 kB) 2025-05-07T19:43:51.1454668Z Downloading pip-25.1.1-py3-none-any.whl (1.8 MB) 2025-05-07T19:43:51.1455581Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.8/1.8 MB 62.5 MB/s eta 0:00:00 2025-05-07T19:43:51.1455950Z Installing collected packages: pip 2025-05-07T19:43:51.1456259Z Attempting uninstall: pip 2025-05-07T19:43:51.1456560Z Found existing installation: pip 25.1 2025-05-07T19:43:51.1456873Z Uninstalling pip-25.1: 2025-05-07T19:43:51.1457163Z Successfully uninstalled pip-25.1 2025-05-07T19:43:51.1457475Z Successfully installed pip-25.1.1 2025-05-07T19:43:51.1457670Z 2025-05-07T19:43:51.2035868Z [SETUP] Upgrading pyOpenSSL ... 2025-05-07T19:43:51.2059935Z [EXEC] [ATTEMPT 0/3] + conda install -n build_binary -c conda-forge --override-channels -y pyOpenSSL>22.1.0 2025-05-07T19:43:51.8652132Z Channels: 2025-05-07T19:43:51.8652437Z - conda-forge 2025-05-07T19:43:51.8652696Z Platform: linux-64 2025-05-07T19:44:01.6888882Z Collecting package metadata (repodata.json): - \ | / - \ | / - \ | / - \ | / - \ done 2025-05-07T19:44:03.6172032Z Solving environment: / - \ | / done 2025-05-07T19:44:03.6632143Z 2025-05-07T19:44:03.6632672Z ## Package Plan ## 2025-05-07T19:44:03.6632888Z 2025-05-07T19:44:03.6633133Z environment location: /github/home/miniconda/envs/build_binary 2025-05-07T19:44:03.6633454Z 2025-05-07T19:44:03.6633556Z added / updated specs: 2025-05-07T19:44:03.6633846Z - pyopenssl[version='>22.1.0'] 2025-05-07T19:44:03.6634039Z 2025-05-07T19:44:03.6634045Z 2025-05-07T19:44:03.6634170Z The following packages will be downloaded: 2025-05-07T19:44:03.6634406Z 2025-05-07T19:44:03.6634571Z package | build 2025-05-07T19:44:03.6634937Z ---------------------------|----------------- 2025-05-07T19:44:03.6635312Z cffi-1.17.1 | py312h06ac9bb_0 288 KB conda-forge 2025-05-07T19:44:03.6635788Z cryptography-44.0.3 | py312hda17c39_0 1.5 MB conda-forge 2025-05-07T19:44:03.6636239Z expat-2.7.0 | h5888daf_0 137 KB conda-forge 2025-05-07T19:44:03.6636671Z libexpat-2.7.0 | h5888daf_0 73 KB conda-forge 2025-05-07T19:44:03.6637147Z libgcc-15.1.0 | h767d61c_2 810 KB conda-forge 2025-05-07T19:44:03.6637574Z libgcc-ng-15.1.0 | h69a702a_2 34 KB conda-forge 2025-05-07T19:44:03.6638017Z libgomp-15.1.0 | h767d61c_2 442 KB conda-forge 2025-05-07T19:44:03.6638432Z libnsl-2.0.1 | hd590300_0 33 KB conda-forge 2025-05-07T19:44:03.6638877Z libsqlite-3.46.0 | hde9e2c9_0 845 KB conda-forge 2025-05-07T19:44:03.6639334Z libuuid-2.38.1 | h0b41bf4_0 33 KB conda-forge 2025-05-07T19:44:03.6639763Z libxcrypt-4.4.36 | hd590300_1 98 KB conda-forge 2025-05-07T19:44:03.6640205Z libzlib-1.2.13 | h4ab18f5_6 60 KB conda-forge 2025-05-07T19:44:03.6640623Z openssl-3.5.0 | h7b32b05_1 3.0 MB conda-forge 2025-05-07T19:44:03.6641075Z pycparser-2.22 | pyh29332c3_1 108 KB conda-forge 2025-05-07T19:44:03.6641520Z pyopenssl-25.0.0 | pyhd8ed1ab_0 120 KB conda-forge 2025-05-07T19:44:03.6641988Z python-3.12.2 |hab00c5b_0_cpython 30.8 MB conda-forge 2025-05-07T19:44:03.6642440Z python_abi-3.12 | 7_cp312 7 KB conda-forge 2025-05-07T19:44:03.6642905Z typing-extensions-4.13.2 | h0e9735f_0 88 KB conda-forge 2025-05-07T19:44:03.6643814Z typing_extensions-4.13.2 | pyh29332c3_0 51 KB conda-forge 2025-05-07T19:44:03.6644285Z zlib-1.2.13 | h4ab18f5_6 91 KB conda-forge 2025-05-07T19:44:03.6644693Z ------------------------------------------------------------ 2025-05-07T19:44:03.6645046Z Total: 38.6 MB 2025-05-07T19:44:03.6645281Z 2025-05-07T19:44:03.6645412Z The following NEW packages will be INSTALLED: 2025-05-07T19:44:03.6645642Z 2025-05-07T19:44:03.6645881Z cffi conda-forge/linux-64::cffi-1.17.1-py312h06ac9bb_0 2025-05-07T19:44:03.6646389Z cryptography conda-forge/linux-64::cryptography-44.0.3-py312hda17c39_0 2025-05-07T19:44:03.6646930Z libexpat conda-forge/linux-64::libexpat-2.7.0-h5888daf_0 2025-05-07T19:44:03.6647406Z libgcc conda-forge/linux-64::libgcc-15.1.0-h767d61c_2 2025-05-07T19:44:03.6647886Z libnsl conda-forge/linux-64::libnsl-2.0.1-hd590300_0 2025-05-07T19:44:03.6648708Z libsqlite conda-forge/linux-64::libsqlite-3.46.0-hde9e2c9_0 2025-05-07T19:44:03.6649211Z libxcrypt conda-forge/linux-64::libxcrypt-4.4.36-hd590300_1 2025-05-07T19:44:03.6649732Z libzlib conda-forge/linux-64::libzlib-1.2.13-h4ab18f5_6 2025-05-07T19:44:03.6650227Z pycparser conda-forge/noarch::pycparser-2.22-pyh29332c3_1 2025-05-07T19:44:03.6650764Z pyopenssl conda-forge/noarch::pyopenssl-25.0.0-pyhd8ed1ab_0 2025-05-07T19:44:03.6651288Z python_abi conda-forge/noarch::python_abi-3.12-7_cp312 2025-05-07T19:44:03.6651840Z typing-extensions conda-forge/noarch::typing-extensions-4.13.2-h0e9735f_0 2025-05-07T19:44:03.6652498Z typing_extensions conda-forge/noarch::typing_extensions-4.13.2-pyh29332c3_0 2025-05-07T19:44:03.6652870Z 2025-05-07T19:44:03.6653001Z The following packages will be UPDATED: 2025-05-07T19:44:03.6653255Z 2025-05-07T19:44:03.6653692Z ca-certificates pkgs/main/linux-64::ca-certificates-2~ --> conda-forge/noarch::ca-certificates-2025.4.26-hbd8a1cb_0 2025-05-07T19:44:03.6654564Z libgcc-ng pkgs/main::libgcc-ng-11.2.0-h1234567_1 --> conda-forge::libgcc-ng-15.1.0-h69a702a_2 2025-05-07T19:44:03.6655261Z libgomp pkgs/main::libgomp-11.2.0-h1234567_1 --> conda-forge::libgomp-15.1.0-h767d61c_2 2025-05-07T19:44:03.6655972Z libuuid pkgs/main::libuuid-1.41.5-h5eee18b_0 --> conda-forge::libuuid-2.38.1-h0b41bf4_0 2025-05-07T19:44:03.6656669Z openssl pkgs/main::openssl-3.0.16-h5eee18b_0 --> conda-forge::openssl-3.5.0-h7b32b05_1 2025-05-07T19:44:03.6657307Z zlib pkgs/main::zlib-1.2.13-h5eee18b_1 --> conda-forge::zlib-1.2.13-h4ab18f5_6 2025-05-07T19:44:03.6657686Z 2025-05-07T19:44:03.6657923Z The following packages will be SUPERSEDED by a higher-priority channel: 2025-05-07T19:44:03.6658264Z 2025-05-07T19:44:03.6658554Z expat pkgs/main::expat-2.7.1-h6a678d5_0 --> conda-forge::expat-2.7.0-h5888daf_0 2025-05-07T19:44:03.6659237Z python pkgs/main::python-3.12.9-h5148396_0 --> conda-forge::python-3.12.2-hab00c5b_0_cpython 2025-05-07T19:44:03.6659651Z 2025-05-07T19:44:03.6659680Z 2025-05-07T19:44:03.6659684Z 2025-05-07T19:44:03.6659843Z Downloading and Extracting Packages: ...working... 2025-05-07T19:44:03.6660254Z python-3.12.2 | 30.8 MB | | 0% 2025-05-07T19:44:03.6660529Z 2025-05-07T19:44:03.6661033Z openssl-3.5.0 | 3.0 MB | | 0%  2025-05-07T19:44:03.6661290Z 2025-05-07T19:44:03.6661294Z 2025-05-07T19:44:03.6661561Z cryptography-44.0.3 | 1.5 MB | | 0%  2025-05-07T19:44:03.6661840Z 2025-05-07T19:44:03.6661843Z 2025-05-07T19:44:03.6661847Z 2025-05-07T19:44:03.6679517Z libsqlite-3.46.0 | 845 KB | | 0%  2025-05-07T19:44:03.6679849Z 2025-05-07T19:44:03.6679853Z 2025-05-07T19:44:03.6679856Z 2025-05-07T19:44:03.6679864Z 2025-05-07T19:44:03.6700711Z libgcc-15.1.0 | 810 KB | | 0%  2025-05-07T19:44:03.6701076Z 2025-05-07T19:44:03.6701081Z 2025-05-07T19:44:03.6701085Z 2025-05-07T19:44:03.6701088Z 2025-05-07T19:44:03.6701092Z 2025-05-07T19:44:03.6701352Z libgomp-15.1.0 | 442 KB | | 0%  2025-05-07T19:44:03.6701638Z 2025-05-07T19:44:03.6701642Z 2025-05-07T19:44:03.6701646Z 2025-05-07T19:44:03.6701680Z 2025-05-07T19:44:03.6701684Z 2025-05-07T19:44:03.6701687Z 2025-05-07T19:44:03.6705116Z cffi-1.17.1 | 288 KB | | 0%  2025-05-07T19:44:03.6705395Z 2025-05-07T19:44:03.6705398Z 2025-05-07T19:44:03.6705402Z 2025-05-07T19:44:03.6705406Z 2025-05-07T19:44:03.6705409Z 2025-05-07T19:44:03.6705413Z 2025-05-07T19:44:03.6705450Z 2025-05-07T19:44:03.6706198Z expat-2.7.0 | 137 KB | | 0%  2025-05-07T19:44:03.6706477Z 2025-05-07T19:44:03.6706480Z 2025-05-07T19:44:03.6706490Z 2025-05-07T19:44:03.6706494Z 2025-05-07T19:44:03.6706497Z 2025-05-07T19:44:03.6706609Z 2025-05-07T19:44:03.6706641Z 2025-05-07T19:44:03.6706644Z 2025-05-07T19:44:03.6709256Z pyopenssl-25.0.0 | 120 KB | | 0%  2025-05-07T19:44:03.6709563Z 2025-05-07T19:44:03.6709566Z 2025-05-07T19:44:03.6709575Z 2025-05-07T19:44:03.6709578Z 2025-05-07T19:44:03.6709582Z 2025-05-07T19:44:03.6709585Z 2025-05-07T19:44:03.6709616Z 2025-05-07T19:44:03.6709619Z 2025-05-07T19:44:03.6709623Z 2025-05-07T19:44:03.6719337Z pycparser-2.22 | 108 KB | | 0%  2025-05-07T19:44:03.6719640Z 2025-05-07T19:44:03.6719655Z 2025-05-07T19:44:03.6719658Z 2025-05-07T19:44:03.6719662Z 2025-05-07T19:44:03.6719693Z 2025-05-07T19:44:03.6719697Z 2025-05-07T19:44:03.6719700Z 2025-05-07T19:44:03.6719704Z 2025-05-07T19:44:03.6719707Z 2025-05-07T19:44:03.6719711Z 2025-05-07T19:44:03.6721293Z libxcrypt-4.4.36 | 98 KB | | 0%  2025-05-07T19:44:03.6721647Z 2025-05-07T19:44:03.6721686Z 2025-05-07T19:44:03.6721691Z 2025-05-07T19:44:03.6721695Z 2025-05-07T19:44:03.6721698Z 2025-05-07T19:44:03.6721702Z 2025-05-07T19:44:03.6721705Z 2025-05-07T19:44:03.6721708Z 2025-05-07T19:44:03.6721712Z 2025-05-07T19:44:03.6721715Z 2025-05-07T19:44:03.6721727Z 2025-05-07T19:44:03.6732503Z zlib-1.2.13 | 91 KB | | 0%  2025-05-07T19:44:03.6733050Z 2025-05-07T19:44:03.6733079Z 2025-05-07T19:44:03.6733088Z 2025-05-07T19:44:03.6733093Z 2025-05-07T19:44:03.6733100Z 2025-05-07T19:44:03.6733104Z 2025-05-07T19:44:03.6733109Z 2025-05-07T19:44:03.6733114Z 2025-05-07T19:44:03.6733151Z 2025-05-07T19:44:03.6733157Z 2025-05-07T19:44:03.6733164Z 2025-05-07T19:44:03.6733171Z 2025-05-07T19:44:03.6733728Z typing-extensions-4. | 88 KB | | 0%  2025-05-07T19:44:03.6734212Z 2025-05-07T19:44:03.6734216Z 2025-05-07T19:44:03.6734219Z 2025-05-07T19:44:03.6734223Z 2025-05-07T19:44:03.6734232Z 2025-05-07T19:44:03.6734240Z 2025-05-07T19:44:03.6734275Z 2025-05-07T19:44:03.6734278Z 2025-05-07T19:44:03.6734282Z 2025-05-07T19:44:03.6734285Z 2025-05-07T19:44:03.6734289Z 2025-05-07T19:44:03.6734292Z 2025-05-07T19:44:03.6734295Z 2025-05-07T19:44:03.6734593Z libexpat-2.7.0 | 73 KB | | 0%  2025-05-07T19:44:03.6734987Z 2025-05-07T19:44:03.6735021Z 2025-05-07T19:44:03.6735027Z 2025-05-07T19:44:03.6735033Z 2025-05-07T19:44:03.6735038Z 2025-05-07T19:44:03.6735046Z 2025-05-07T19:44:03.6735051Z 2025-05-07T19:44:03.6735056Z 2025-05-07T19:44:03.6735065Z 2025-05-07T19:44:03.6735070Z 2025-05-07T19:44:03.6735074Z 2025-05-07T19:44:03.6735080Z 2025-05-07T19:44:03.6735087Z 2025-05-07T19:44:03.6735091Z 2025-05-07T19:44:03.6735642Z libzlib-1.2.13 | 60 KB | | 0%  2025-05-07T19:44:03.6736080Z 2025-05-07T19:44:03.6736084Z 2025-05-07T19:44:03.6736087Z 2025-05-07T19:44:03.6736096Z 2025-05-07T19:44:03.6736350Z 2025-05-07T19:44:03.6736354Z 2025-05-07T19:44:03.6736358Z 2025-05-07T19:44:03.6736361Z 2025-05-07T19:44:03.6736364Z 2025-05-07T19:44:03.6736368Z 2025-05-07T19:44:03.6736371Z 2025-05-07T19:44:03.6736375Z 2025-05-07T19:44:03.6736378Z 2025-05-07T19:44:03.6736381Z 2025-05-07T19:44:03.6736385Z 2025-05-07T19:44:03.6745755Z typing_extensions-4. | 51 KB | | 0%  2025-05-07T19:44:03.6746215Z 2025-05-07T19:44:03.6746219Z 2025-05-07T19:44:03.6746223Z 2025-05-07T19:44:03.6746226Z 2025-05-07T19:44:03.6746231Z 2025-05-07T19:44:03.6746253Z 2025-05-07T19:44:03.6746263Z 2025-05-07T19:44:03.6746270Z 2025-05-07T19:44:03.6746275Z 2025-05-07T19:44:03.6746280Z 2025-05-07T19:44:03.6746284Z 2025-05-07T19:44:03.6746322Z 2025-05-07T19:44:03.6746329Z 2025-05-07T19:44:03.6746335Z 2025-05-07T19:44:03.6746341Z 2025-05-07T19:44:03.6746347Z 2025-05-07T19:44:03.6746897Z libgcc-ng-15.1.0 | 34 KB | | 0%  2025-05-07T19:44:03.6747588Z 2025-05-07T19:44:03.6747594Z 2025-05-07T19:44:03.6747599Z 2025-05-07T19:44:03.6747607Z 2025-05-07T19:44:03.6747654Z 2025-05-07T19:44:03.6747660Z 2025-05-07T19:44:03.6747667Z 2025-05-07T19:44:03.6747674Z 2025-05-07T19:44:03.6747682Z 2025-05-07T19:44:03.6747687Z 2025-05-07T19:44:03.6747694Z 2025-05-07T19:44:03.6747702Z 2025-05-07T19:44:03.6747722Z 2025-05-07T19:44:03.6747727Z 2025-05-07T19:44:03.6747732Z 2025-05-07T19:44:03.6747737Z 2025-05-07T19:44:03.6747742Z 2025-05-07T19:44:03.6748320Z libuuid-2.38.1 | 33 KB | | 0%  2025-05-07T19:44:03.6748870Z 2025-05-07T19:44:03.6748876Z 2025-05-07T19:44:03.6748883Z 2025-05-07T19:44:03.6748890Z 2025-05-07T19:44:03.6748896Z 2025-05-07T19:44:03.6748903Z 2025-05-07T19:44:03.6748910Z 2025-05-07T19:44:03.6748918Z 2025-05-07T19:44:03.6748923Z 2025-05-07T19:44:03.6748930Z 2025-05-07T19:44:03.6748945Z 2025-05-07T19:44:03.6748959Z 2025-05-07T19:44:03.6748966Z 2025-05-07T19:44:03.6748971Z 2025-05-07T19:44:03.6748979Z 2025-05-07T19:44:03.6748986Z 2025-05-07T19:44:03.6748992Z 2025-05-07T19:44:03.6748999Z 2025-05-07T19:44:03.6749535Z libnsl-2.0.1 | 33 KB | | 0%  2025-05-07T19:44:03.6749847Z 2025-05-07T19:44:03.6749850Z 2025-05-07T19:44:03.6749854Z 2025-05-07T19:44:03.6749857Z 2025-05-07T19:44:03.6749861Z 2025-05-07T19:44:03.6749864Z 2025-05-07T19:44:03.6749867Z 2025-05-07T19:44:03.6749901Z 2025-05-07T19:44:03.6749905Z 2025-05-07T19:44:03.6749908Z 2025-05-07T19:44:03.6749912Z 2025-05-07T19:44:03.6749915Z 2025-05-07T19:44:03.6749918Z 2025-05-07T19:44:03.6749922Z 2025-05-07T19:44:03.6749925Z 2025-05-07T19:44:03.6749928Z 2025-05-07T19:44:03.6749932Z 2025-05-07T19:44:03.6749935Z 2025-05-07T19:44:03.6749938Z 2025-05-07T19:44:03.7346567Z ... (more hidden) ... 2025-05-07T19:44:03.7347109Z 2025-05-07T19:44:03.7347113Z 2025-05-07T19:44:03.7347127Z 2025-05-07T19:44:03.7463653Z libsqlite-3.46.0 | 845 KB | ########## | 100%  2025-05-07T19:44:03.7464221Z 2025-05-07T19:44:03.7464225Z 2025-05-07T19:44:03.7464261Z 2025-05-07T19:44:03.7464265Z 2025-05-07T19:44:03.7501251Z libgcc-15.1.0 | 810 KB | ########## | 100%  2025-05-07T19:44:03.7501647Z 2025-05-07T19:44:03.7635513Z openssl-3.5.0 | 3.0 MB | ########## | 100%  2025-05-07T19:44:03.7652126Z python-3.12.2 | 30.8 MB | 1 | 1% 2025-05-07T19:44:03.7652523Z 2025-05-07T19:44:03.7652528Z 2025-05-07T19:44:03.7719453Z cryptography-44.0.3 | 1.5 MB | 6 | 6%  2025-05-07T19:44:03.7719779Z 2025-05-07T19:44:03.7719784Z 2025-05-07T19:44:03.7719787Z 2025-05-07T19:44:03.7719792Z 2025-05-07T19:44:03.7719795Z 2025-05-07T19:44:03.7759618Z libgomp-15.1.0 | 442 KB | 3 | 4%  2025-05-07T19:44:03.7760034Z 2025-05-07T19:44:03.7760244Z 2025-05-07T19:44:03.7760250Z 2025-05-07T19:44:03.7766712Z libsqlite-3.46.0 | 845 KB | ########## | 100%  2025-05-07T19:44:03.7766993Z 2025-05-07T19:44:03.7766997Z 2025-05-07T19:44:03.7767397Z 2025-05-07T19:44:03.7776494Z libsqlite-3.46.0 | 845 KB | ########## | 100%  2025-05-07T19:44:03.7776847Z 2025-05-07T19:44:03.7776852Z 2025-05-07T19:44:03.7776855Z 2025-05-07T19:44:03.7776858Z 2025-05-07T19:44:03.7776862Z 2025-05-07T19:44:03.7776865Z 2025-05-07T19:44:03.7776872Z 2025-05-07T19:44:03.7859977Z expat-2.7.0 | 137 KB | #1 | 12%  2025-05-07T19:44:03.7860544Z 2025-05-07T19:44:03.7860621Z 2025-05-07T19:44:03.7860627Z 2025-05-07T19:44:03.7860633Z 2025-05-07T19:44:03.7860718Z 2025-05-07T19:44:03.7860726Z 2025-05-07T19:44:03.7860731Z 2025-05-07T19:44:03.7939063Z expat-2.7.0 | 137 KB | ########## | 100%  2025-05-07T19:44:03.7939380Z 2025-05-07T19:44:03.7939583Z 2025-05-07T19:44:03.7939623Z 2025-05-07T19:44:03.7939627Z 2025-05-07T19:44:03.7939630Z 2025-05-07T19:44:03.7939634Z 2025-05-07T19:44:03.8032169Z cffi-1.17.1 | 288 KB | 5 | 6%  2025-05-07T19:44:03.8033004Z 2025-05-07T19:44:03.8033018Z 2025-05-07T19:44:03.8033030Z 2025-05-07T19:44:03.8033041Z 2025-05-07T19:44:03.8033051Z 2025-05-07T19:44:03.8113717Z libgomp-15.1.0 | 442 KB | ########## | 100%  2025-05-07T19:44:03.8114038Z 2025-05-07T19:44:03.8114042Z 2025-05-07T19:44:03.8114046Z 2025-05-07T19:44:03.8114050Z 2025-05-07T19:44:03.8114054Z 2025-05-07T19:44:03.8114057Z 2025-05-07T19:44:03.8151667Z cffi-1.17.1 | 288 KB | ########## | 100%  2025-05-07T19:44:03.8151977Z 2025-05-07T19:44:03.8151982Z 2025-05-07T19:44:03.8151985Z 2025-05-07T19:44:03.8151989Z 2025-05-07T19:44:03.8151992Z 2025-05-07T19:44:03.8151996Z 2025-05-07T19:44:03.8151999Z 2025-05-07T19:44:03.8152003Z 2025-05-07T19:44:03.8194952Z pyopenssl-25.0.0 | 120 KB | #3 | 13%  2025-05-07T19:44:03.8195334Z 2025-05-07T19:44:03.8195338Z 2025-05-07T19:44:03.8195342Z 2025-05-07T19:44:03.8195346Z 2025-05-07T19:44:03.8195349Z 2025-05-07T19:44:03.8195352Z 2025-05-07T19:44:03.8195356Z 2025-05-07T19:44:03.8195359Z 2025-05-07T19:44:03.8290719Z pyopenssl-25.0.0 | 120 KB | ########## | 100%  2025-05-07T19:44:03.8291182Z 2025-05-07T19:44:03.8291189Z 2025-05-07T19:44:03.8322419Z cryptography-44.0.3 | 1.5 MB | ########## | 100%  2025-05-07T19:44:03.8322908Z 2025-05-07T19:44:03.8322912Z 2025-05-07T19:44:03.8322916Z 2025-05-07T19:44:03.8322919Z 2025-05-07T19:44:03.8322923Z 2025-05-07T19:44:03.8322927Z 2025-05-07T19:44:03.8322930Z 2025-05-07T19:44:03.8322934Z 2025-05-07T19:44:03.8322937Z 2025-05-07T19:44:03.8346059Z pycparser-2.22 | 108 KB | #4 | 15%  2025-05-07T19:44:03.8346574Z 2025-05-07T19:44:03.8346579Z 2025-05-07T19:44:03.8346607Z 2025-05-07T19:44:03.8346611Z 2025-05-07T19:44:03.8346615Z 2025-05-07T19:44:03.8346619Z 2025-05-07T19:44:03.8346622Z 2025-05-07T19:44:03.8346625Z 2025-05-07T19:44:03.8346629Z 2025-05-07T19:44:03.8369490Z pycparser-2.22 | 108 KB | ########## | 100%  2025-05-07T19:44:03.8369932Z 2025-05-07T19:44:03.8369937Z 2025-05-07T19:44:03.8369941Z 2025-05-07T19:44:03.8369945Z 2025-05-07T19:44:03.8369949Z 2025-05-07T19:44:03.8369952Z 2025-05-07T19:44:03.8369955Z 2025-05-07T19:44:03.8454746Z expat-2.7.0 | 137 KB | ########## | 100%  2025-05-07T19:44:03.8455220Z 2025-05-07T19:44:03.8455257Z 2025-05-07T19:44:03.8455261Z 2025-05-07T19:44:03.8455282Z 2025-05-07T19:44:03.8455379Z 2025-05-07T19:44:03.8455388Z 2025-05-07T19:44:03.8455393Z 2025-05-07T19:44:03.8455398Z 2025-05-07T19:44:03.8455403Z 2025-05-07T19:44:03.8455421Z 2025-05-07T19:44:03.8489120Z libxcrypt-4.4.36 | 98 KB | #6 | 16%  2025-05-07T19:44:03.8489861Z 2025-05-07T19:44:03.8489868Z 2025-05-07T19:44:03.8489872Z 2025-05-07T19:44:03.8489875Z 2025-05-07T19:44:03.8489879Z 2025-05-07T19:44:03.8489882Z 2025-05-07T19:44:03.8489886Z 2025-05-07T19:44:03.8489889Z 2025-05-07T19:44:03.8489892Z 2025-05-07T19:44:03.8489896Z 2025-05-07T19:44:03.8574131Z libxcrypt-4.4.36 | 98 KB | ########## | 100%  2025-05-07T19:44:03.8574628Z 2025-05-07T19:44:03.8574639Z 2025-05-07T19:44:03.8574645Z 2025-05-07T19:44:03.8574652Z 2025-05-07T19:44:03.8575109Z libgcc-15.1.0 | 810 KB | ########## | 100%  2025-05-07T19:44:03.8575486Z 2025-05-07T19:44:03.8575490Z 2025-05-07T19:44:03.8575493Z 2025-05-07T19:44:03.8575497Z 2025-05-07T19:44:03.8594425Z libgcc-15.1.0 | 810 KB | ########## | 100%  2025-05-07T19:44:03.8594895Z 2025-05-07T19:44:03.8594899Z 2025-05-07T19:44:03.8594903Z 2025-05-07T19:44:03.8594906Z 2025-05-07T19:44:03.8594922Z 2025-05-07T19:44:03.8595808Z 2025-05-07T19:44:03.8595829Z 2025-05-07T19:44:03.8595835Z 2025-05-07T19:44:03.8595842Z 2025-05-07T19:44:03.8595850Z 2025-05-07T19:44:03.8595868Z 2025-05-07T19:44:03.8627410Z zlib-1.2.13 | 91 KB | #7 | 18%  2025-05-07T19:44:03.8627848Z 2025-05-07T19:44:03.8627853Z 2025-05-07T19:44:03.8627857Z 2025-05-07T19:44:03.8627860Z 2025-05-07T19:44:03.8627864Z 2025-05-07T19:44:03.8627867Z 2025-05-07T19:44:03.8627871Z 2025-05-07T19:44:03.8627874Z 2025-05-07T19:44:03.8627878Z 2025-05-07T19:44:03.8627881Z 2025-05-07T19:44:03.8627885Z 2025-05-07T19:44:03.8642371Z zlib-1.2.13 | 91 KB | ########## | 100%  2025-05-07T19:44:03.8712470Z python-3.12.2 | 30.8 MB | #6 | 16% 2025-05-07T19:44:03.8712997Z 2025-05-07T19:44:03.8713006Z 2025-05-07T19:44:03.8713014Z 2025-05-07T19:44:03.8713019Z 2025-05-07T19:44:03.8713026Z 2025-05-07T19:44:03.8713034Z 2025-05-07T19:44:03.8713070Z 2025-05-07T19:44:03.8713085Z 2025-05-07T19:44:03.8713091Z 2025-05-07T19:44:03.8713096Z 2025-05-07T19:44:03.8713102Z 2025-05-07T19:44:03.8713107Z 2025-05-07T19:44:03.8713112Z 2025-05-07T19:44:03.8740517Z libexpat-2.7.0 | 73 KB | ##2 | 22%  2025-05-07T19:44:03.8741070Z 2025-05-07T19:44:03.8741075Z 2025-05-07T19:44:03.8741078Z 2025-05-07T19:44:03.8741081Z 2025-05-07T19:44:03.8741085Z 2025-05-07T19:44:03.8741088Z 2025-05-07T19:44:03.8741092Z 2025-05-07T19:44:03.8741095Z 2025-05-07T19:44:03.8741099Z 2025-05-07T19:44:03.8741102Z 2025-05-07T19:44:03.8741105Z 2025-05-07T19:44:03.8741109Z 2025-05-07T19:44:03.8741112Z 2025-05-07T19:44:03.8825381Z libexpat-2.7.0 | 73 KB | ########## | 100%  2025-05-07T19:44:03.8825718Z 2025-05-07T19:44:03.8825723Z 2025-05-07T19:44:03.8825727Z 2025-05-07T19:44:03.8825730Z 2025-05-07T19:44:03.8825734Z 2025-05-07T19:44:03.8825738Z 2025-05-07T19:44:03.8825754Z 2025-05-07T19:44:03.8825764Z 2025-05-07T19:44:03.8825768Z 2025-05-07T19:44:03.8825771Z 2025-05-07T19:44:03.8825775Z 2025-05-07T19:44:03.8825778Z 2025-05-07T19:44:03.8825781Z 2025-05-07T19:44:03.8825791Z 2025-05-07T19:44:03.8850926Z libzlib-1.2.13 | 60 KB | ##6 | 27%  2025-05-07T19:44:03.8851417Z 2025-05-07T19:44:03.8851422Z 2025-05-07T19:44:03.8851426Z 2025-05-07T19:44:03.8851430Z 2025-05-07T19:44:03.8851434Z 2025-05-07T19:44:03.8851437Z 2025-05-07T19:44:03.8851454Z 2025-05-07T19:44:03.8851468Z 2025-05-07T19:44:03.8851472Z 2025-05-07T19:44:03.8851475Z 2025-05-07T19:44:03.8851478Z 2025-05-07T19:44:03.8851482Z 2025-05-07T19:44:03.8851485Z 2025-05-07T19:44:03.8851488Z 2025-05-07T19:44:03.8880001Z libzlib-1.2.13 | 60 KB | ########## | 100%  2025-05-07T19:44:03.8880511Z 2025-05-07T19:44:03.8880529Z 2025-05-07T19:44:03.8880533Z 2025-05-07T19:44:03.8880537Z 2025-05-07T19:44:03.8880554Z 2025-05-07T19:44:03.8880728Z 2025-05-07T19:44:03.8880733Z 2025-05-07T19:44:03.8880736Z 2025-05-07T19:44:03.8880740Z 2025-05-07T19:44:03.8880743Z 2025-05-07T19:44:03.8880747Z 2025-05-07T19:44:03.8880750Z 2025-05-07T19:44:03.8912066Z typing-extensions-4. | 88 KB | #8 | 18%  2025-05-07T19:44:03.8912518Z 2025-05-07T19:44:03.8912523Z 2025-05-07T19:44:03.8912527Z 2025-05-07T19:44:03.8912530Z 2025-05-07T19:44:03.8912534Z 2025-05-07T19:44:03.8912537Z 2025-05-07T19:44:03.8912541Z 2025-05-07T19:44:03.8912544Z 2025-05-07T19:44:03.8912548Z 2025-05-07T19:44:03.8912551Z 2025-05-07T19:44:03.8912555Z 2025-05-07T19:44:03.8912558Z 2025-05-07T19:44:03.8945855Z typing-extensions-4. | 88 KB | ########## | 100%  2025-05-07T19:44:03.8946245Z 2025-05-07T19:44:03.8946250Z 2025-05-07T19:44:03.8946253Z 2025-05-07T19:44:03.8946257Z 2025-05-07T19:44:03.8946260Z 2025-05-07T19:44:03.8946264Z 2025-05-07T19:44:03.8946438Z 2025-05-07T19:44:03.8946449Z 2025-05-07T19:44:03.8946453Z 2025-05-07T19:44:03.8946456Z 2025-05-07T19:44:03.8946459Z 2025-05-07T19:44:03.8946463Z 2025-05-07T19:44:03.8946466Z 2025-05-07T19:44:03.8946483Z 2025-05-07T19:44:03.8946487Z 2025-05-07T19:44:03.8954696Z typing_extensions-4. | 51 KB | ###1 | 31%  2025-05-07T19:44:03.8955271Z 2025-05-07T19:44:03.8955275Z 2025-05-07T19:44:03.8955278Z 2025-05-07T19:44:03.8955282Z 2025-05-07T19:44:03.8955289Z 2025-05-07T19:44:03.8959007Z libgomp-15.1.0 | 442 KB | ########## | 100%  2025-05-07T19:44:03.8959547Z 2025-05-07T19:44:03.8959554Z 2025-05-07T19:44:03.8959561Z 2025-05-07T19:44:03.8959570Z 2025-05-07T19:44:03.8959581Z 2025-05-07T19:44:03.8966902Z libgomp-15.1.0 | 442 KB | ########## | 100%  2025-05-07T19:44:03.8967330Z 2025-05-07T19:44:03.8967334Z 2025-05-07T19:44:03.8967337Z 2025-05-07T19:44:03.8967341Z 2025-05-07T19:44:03.8967344Z 2025-05-07T19:44:03.8967354Z 2025-05-07T19:44:03.8967361Z 2025-05-07T19:44:03.8967364Z 2025-05-07T19:44:03.8967368Z 2025-05-07T19:44:03.8967371Z 2025-05-07T19:44:03.8967375Z 2025-05-07T19:44:03.8967378Z 2025-05-07T19:44:03.8967394Z 2025-05-07T19:44:03.8967398Z 2025-05-07T19:44:03.8968882Z 2025-05-07T19:44:03.9136283Z typing_extensions-4. | 51 KB | ########## | 100%  2025-05-07T19:44:03.9136726Z 2025-05-07T19:44:03.9136731Z 2025-05-07T19:44:03.9136735Z 2025-05-07T19:44:03.9136752Z 2025-05-07T19:44:03.9136756Z 2025-05-07T19:44:03.9136759Z 2025-05-07T19:44:03.9136763Z 2025-05-07T19:44:03.9136766Z 2025-05-07T19:44:03.9136770Z 2025-05-07T19:44:03.9136773Z 2025-05-07T19:44:03.9136776Z 2025-05-07T19:44:03.9136780Z 2025-05-07T19:44:03.9136783Z 2025-05-07T19:44:03.9136787Z 2025-05-07T19:44:03.9136790Z 2025-05-07T19:44:03.9136794Z 2025-05-07T19:44:03.9155217Z libgcc-ng-15.1.0 | 34 KB | ####7 | 47%  2025-05-07T19:44:03.9155809Z 2025-05-07T19:44:03.9155814Z 2025-05-07T19:44:03.9155818Z 2025-05-07T19:44:03.9155823Z 2025-05-07T19:44:03.9155827Z 2025-05-07T19:44:03.9155832Z 2025-05-07T19:44:03.9155836Z 2025-05-07T19:44:03.9155841Z 2025-05-07T19:44:03.9155846Z 2025-05-07T19:44:03.9155850Z 2025-05-07T19:44:03.9155854Z 2025-05-07T19:44:03.9155859Z 2025-05-07T19:44:03.9155863Z 2025-05-07T19:44:03.9155868Z 2025-05-07T19:44:03.9155872Z 2025-05-07T19:44:03.9155876Z 2025-05-07T19:44:03.9249863Z libgcc-ng-15.1.0 | 34 KB | ########## | 100%  2025-05-07T19:44:03.9250496Z 2025-05-07T19:44:03.9250505Z 2025-05-07T19:44:03.9250513Z 2025-05-07T19:44:03.9250520Z 2025-05-07T19:44:03.9250526Z 2025-05-07T19:44:03.9250533Z 2025-05-07T19:44:03.9250541Z 2025-05-07T19:44:03.9250548Z 2025-05-07T19:44:03.9250579Z 2025-05-07T19:44:03.9250587Z 2025-05-07T19:44:03.9250595Z 2025-05-07T19:44:03.9250601Z 2025-05-07T19:44:03.9250608Z 2025-05-07T19:44:03.9250634Z 2025-05-07T19:44:03.9250838Z 2025-05-07T19:44:03.9250844Z 2025-05-07T19:44:03.9250848Z 2025-05-07T19:44:03.9271922Z libuuid-2.38.1 | 33 KB | ####8 | 49%  2025-05-07T19:44:03.9272530Z 2025-05-07T19:44:03.9272534Z 2025-05-07T19:44:03.9272537Z 2025-05-07T19:44:03.9272541Z 2025-05-07T19:44:03.9272544Z 2025-05-07T19:44:03.9272547Z 2025-05-07T19:44:03.9272551Z 2025-05-07T19:44:03.9272554Z 2025-05-07T19:44:03.9272558Z 2025-05-07T19:44:03.9272561Z 2025-05-07T19:44:03.9272564Z 2025-05-07T19:44:03.9272568Z 2025-05-07T19:44:03.9272571Z 2025-05-07T19:44:03.9272574Z 2025-05-07T19:44:03.9272587Z 2025-05-07T19:44:03.9272590Z 2025-05-07T19:44:03.9272594Z 2025-05-07T19:44:03.9286687Z libuuid-2.38.1 | 33 KB | ########## | 100%  2025-05-07T19:44:03.9287132Z 2025-05-07T19:44:03.9287136Z 2025-05-07T19:44:03.9287140Z 2025-05-07T19:44:03.9287154Z 2025-05-07T19:44:03.9287285Z 2025-05-07T19:44:03.9287294Z 2025-05-07T19:44:03.9287297Z 2025-05-07T19:44:03.9287301Z 2025-05-07T19:44:03.9287304Z 2025-05-07T19:44:03.9287307Z 2025-05-07T19:44:03.9287310Z 2025-05-07T19:44:03.9287314Z 2025-05-07T19:44:03.9287332Z 2025-05-07T19:44:03.9287335Z 2025-05-07T19:44:03.9287339Z 2025-05-07T19:44:03.9287342Z 2025-05-07T19:44:03.9287345Z 2025-05-07T19:44:03.9287349Z 2025-05-07T19:44:03.9287352Z 2025-05-07T19:44:03.9296899Z ... (more hidden) ... 2025-05-07T19:44:03.9297405Z 2025-05-07T19:44:03.9297408Z 2025-05-07T19:44:03.9297427Z 2025-05-07T19:44:03.9297440Z 2025-05-07T19:44:03.9297443Z 2025-05-07T19:44:03.9297447Z 2025-05-07T19:44:03.9297450Z 2025-05-07T19:44:03.9297454Z 2025-05-07T19:44:03.9297457Z 2025-05-07T19:44:03.9297460Z 2025-05-07T19:44:03.9297464Z 2025-05-07T19:44:03.9297467Z 2025-05-07T19:44:03.9297470Z 2025-05-07T19:44:03.9297474Z 2025-05-07T19:44:03.9297477Z 2025-05-07T19:44:03.9297481Z 2025-05-07T19:44:03.9297490Z 2025-05-07T19:44:03.9297497Z 2025-05-07T19:44:03.9297500Z 2025-05-07T19:44:03.9331719Z ... (more hidden) ... 2025-05-07T19:44:03.9332118Z 2025-05-07T19:44:03.9332126Z 2025-05-07T19:44:03.9332134Z 2025-05-07T19:44:03.9332140Z 2025-05-07T19:44:03.9332147Z 2025-05-07T19:44:03.9332154Z 2025-05-07T19:44:03.9332162Z 2025-05-07T19:44:03.9332168Z 2025-05-07T19:44:03.9332731Z pyopenssl-25.0.0 | 120 KB | ########## | 100%  2025-05-07T19:44:03.9333081Z 2025-05-07T19:44:03.9333085Z 2025-05-07T19:44:03.9333089Z 2025-05-07T19:44:03.9333092Z 2025-05-07T19:44:03.9333096Z 2025-05-07T19:44:03.9333099Z 2025-05-07T19:44:03.9333103Z 2025-05-07T19:44:03.9333106Z 2025-05-07T19:44:03.9385237Z pyopenssl-25.0.0 | 120 KB | ########## | 100%  2025-05-07T19:44:03.9385672Z 2025-05-07T19:44:03.9385678Z 2025-05-07T19:44:03.9385681Z 2025-05-07T19:44:03.9385685Z 2025-05-07T19:44:03.9385701Z 2025-05-07T19:44:03.9385712Z 2025-05-07T19:44:03.9387670Z cffi-1.17.1 | 288 KB | ########## | 100%  2025-05-07T19:44:03.9388154Z 2025-05-07T19:44:03.9388162Z 2025-05-07T19:44:03.9388169Z 2025-05-07T19:44:03.9388202Z 2025-05-07T19:44:03.9388210Z 2025-05-07T19:44:03.9388221Z 2025-05-07T19:44:03.9392896Z cffi-1.17.1 | 288 KB | ########## | 100%  2025-05-07T19:44:03.9393398Z 2025-05-07T19:44:03.9393406Z 2025-05-07T19:44:03.9393412Z 2025-05-07T19:44:03.9393418Z 2025-05-07T19:44:03.9393457Z 2025-05-07T19:44:03.9393463Z 2025-05-07T19:44:03.9393470Z 2025-05-07T19:44:03.9393477Z 2025-05-07T19:44:03.9393485Z 2025-05-07T19:44:03.9393490Z 2025-05-07T19:44:03.9393497Z 2025-05-07T19:44:03.9393505Z 2025-05-07T19:44:03.9393510Z 2025-05-07T19:44:03.9393517Z 2025-05-07T19:44:03.9393525Z 2025-05-07T19:44:03.9393530Z 2025-05-07T19:44:03.9393537Z 2025-05-07T19:44:03.9393552Z 2025-05-07T19:44:03.9414931Z libnsl-2.0.1 | 33 KB | ####9 | 49%  2025-05-07T19:44:03.9415344Z 2025-05-07T19:44:03.9415348Z 2025-05-07T19:44:03.9415352Z 2025-05-07T19:44:03.9415356Z 2025-05-07T19:44:03.9415359Z 2025-05-07T19:44:03.9415362Z 2025-05-07T19:44:03.9415366Z 2025-05-07T19:44:03.9415369Z 2025-05-07T19:44:03.9415373Z 2025-05-07T19:44:03.9415376Z 2025-05-07T19:44:03.9415379Z 2025-05-07T19:44:03.9415383Z 2025-05-07T19:44:03.9415386Z 2025-05-07T19:44:03.9415390Z 2025-05-07T19:44:03.9415393Z 2025-05-07T19:44:03.9415419Z 2025-05-07T19:44:03.9415422Z 2025-05-07T19:44:03.9415426Z 2025-05-07T19:44:03.9643151Z libnsl-2.0.1 | 33 KB | ########## | 100%  2025-05-07T19:44:04.0537221Z python-3.12.2 | 30.8 MB | ###1 | 32% 2025-05-07T19:44:04.0537753Z 2025-05-07T19:44:04.0538267Z openssl-3.5.0 | 3.0 MB | ########## | 100%  2025-05-07T19:44:04.0538566Z 2025-05-07T19:44:04.0645919Z openssl-3.5.0 | 3.0 MB | ########## | 100%  2025-05-07T19:44:04.0845483Z python-3.12.2 | 30.8 MB | ##### | 51% 2025-05-07T19:44:04.0846073Z 2025-05-07T19:44:04.0846082Z 2025-05-07T19:44:04.0846089Z 2025-05-07T19:44:04.0846097Z 2025-05-07T19:44:04.0846103Z 2025-05-07T19:44:04.0846110Z 2025-05-07T19:44:04.0846117Z 2025-05-07T19:44:04.0846125Z 2025-05-07T19:44:04.0846131Z 2025-05-07T19:44:04.0846138Z 2025-05-07T19:44:04.0846539Z libxcrypt-4.4.36 | 98 KB | ########## | 100%  2025-05-07T19:44:04.0846851Z 2025-05-07T19:44:04.0846854Z 2025-05-07T19:44:04.0846858Z 2025-05-07T19:44:04.0846861Z 2025-05-07T19:44:04.0846864Z 2025-05-07T19:44:04.0846868Z 2025-05-07T19:44:04.0846871Z 2025-05-07T19:44:04.0846875Z 2025-05-07T19:44:04.0846878Z 2025-05-07T19:44:04.0846881Z 2025-05-07T19:44:04.0853655Z libxcrypt-4.4.36 | 98 KB | ########## | 100%  2025-05-07T19:44:04.0853960Z 2025-05-07T19:44:04.0853964Z 2025-05-07T19:44:04.0853982Z 2025-05-07T19:44:04.0853993Z 2025-05-07T19:44:04.0854003Z 2025-05-07T19:44:04.0854006Z 2025-05-07T19:44:04.0854010Z 2025-05-07T19:44:04.0854013Z 2025-05-07T19:44:04.0854016Z 2025-05-07T19:44:04.0859175Z pycparser-2.22 | 108 KB | ########## | 100%  2025-05-07T19:44:04.0859480Z 2025-05-07T19:44:04.0859483Z 2025-05-07T19:44:04.0859493Z 2025-05-07T19:44:04.0859496Z 2025-05-07T19:44:04.0859500Z 2025-05-07T19:44:04.0859503Z 2025-05-07T19:44:04.0859506Z 2025-05-07T19:44:04.0859509Z 2025-05-07T19:44:04.0859513Z 2025-05-07T19:44:04.1062901Z pycparser-2.22 | 108 KB | ########## | 100%  2025-05-07T19:44:04.1063307Z 2025-05-07T19:44:04.1063612Z 2025-05-07T19:44:04.1063621Z 2025-05-07T19:44:04.1063625Z 2025-05-07T19:44:04.1063631Z 2025-05-07T19:44:04.1063638Z 2025-05-07T19:44:04.1063644Z 2025-05-07T19:44:04.1063648Z 2025-05-07T19:44:04.1063653Z 2025-05-07T19:44:04.1063658Z 2025-05-07T19:44:04.1063663Z 2025-05-07T19:44:04.1063696Z 2025-05-07T19:44:04.1063701Z 2025-05-07T19:44:04.1064247Z libexpat-2.7.0 | 73 KB | ########## | 100%  2025-05-07T19:44:04.1064633Z 2025-05-07T19:44:04.1064636Z 2025-05-07T19:44:04.1064639Z 2025-05-07T19:44:04.1064643Z 2025-05-07T19:44:04.1064646Z 2025-05-07T19:44:04.1064650Z 2025-05-07T19:44:04.1064663Z 2025-05-07T19:44:04.1064666Z 2025-05-07T19:44:04.1064670Z 2025-05-07T19:44:04.1064673Z 2025-05-07T19:44:04.1064676Z 2025-05-07T19:44:04.1064680Z 2025-05-07T19:44:04.1064683Z 2025-05-07T19:44:04.1138291Z libexpat-2.7.0 | 73 KB | ########## | 100%  2025-05-07T19:44:04.1138699Z 2025-05-07T19:44:04.1138704Z 2025-05-07T19:44:04.1138707Z 2025-05-07T19:44:04.1138711Z 2025-05-07T19:44:04.1138714Z 2025-05-07T19:44:04.1138717Z 2025-05-07T19:44:04.1138721Z 2025-05-07T19:44:04.1138724Z 2025-05-07T19:44:04.1138728Z 2025-05-07T19:44:04.1138752Z 2025-05-07T19:44:04.1138756Z 2025-05-07T19:44:04.1139217Z zlib-1.2.13 | 91 KB | ########## | 100%  2025-05-07T19:44:04.1139507Z 2025-05-07T19:44:04.1139510Z 2025-05-07T19:44:04.1139514Z 2025-05-07T19:44:04.1139518Z 2025-05-07T19:44:04.1139521Z 2025-05-07T19:44:04.1139524Z 2025-05-07T19:44:04.1139550Z 2025-05-07T19:44:04.1139553Z 2025-05-07T19:44:04.1139556Z 2025-05-07T19:44:04.1139560Z 2025-05-07T19:44:04.1139568Z 2025-05-07T19:44:04.1306204Z zlib-1.2.13 | 91 KB | ########## | 100%  2025-05-07T19:44:04.1306555Z 2025-05-07T19:44:04.1306560Z 2025-05-07T19:44:04.1306585Z 2025-05-07T19:44:04.1306589Z 2025-05-07T19:44:04.1306592Z 2025-05-07T19:44:04.1306596Z 2025-05-07T19:44:04.1306600Z 2025-05-07T19:44:04.1306603Z 2025-05-07T19:44:04.1306607Z 2025-05-07T19:44:04.1306610Z 2025-05-07T19:44:04.1306614Z 2025-05-07T19:44:04.1306617Z 2025-05-07T19:44:04.1306621Z 2025-05-07T19:44:04.1306624Z 2025-05-07T19:44:04.1307204Z libzlib-1.2.13 | 60 KB | ########## | 100%  2025-05-07T19:44:04.1307705Z 2025-05-07T19:44:04.1307709Z 2025-05-07T19:44:04.1307712Z 2025-05-07T19:44:04.1307716Z 2025-05-07T19:44:04.1307719Z 2025-05-07T19:44:04.1307722Z 2025-05-07T19:44:04.1307726Z 2025-05-07T19:44:04.1307737Z 2025-05-07T19:44:04.1307740Z 2025-05-07T19:44:04.1307744Z 2025-05-07T19:44:04.1307747Z 2025-05-07T19:44:04.1307750Z 2025-05-07T19:44:04.1307754Z 2025-05-07T19:44:04.1307757Z 2025-05-07T19:44:04.1316224Z libzlib-1.2.13 | 60 KB | ########## | 100%  2025-05-07T19:44:04.1316534Z 2025-05-07T19:44:04.1316538Z 2025-05-07T19:44:04.1316541Z 2025-05-07T19:44:04.1316545Z 2025-05-07T19:44:04.1316548Z 2025-05-07T19:44:04.1316551Z 2025-05-07T19:44:04.1316555Z 2025-05-07T19:44:04.1316558Z 2025-05-07T19:44:04.1316562Z 2025-05-07T19:44:04.1316565Z 2025-05-07T19:44:04.1316568Z 2025-05-07T19:44:04.1317936Z 2025-05-07T19:44:04.1319950Z typing-extensions-4. | 88 KB | ########## | 100%  2025-05-07T19:44:04.1320290Z 2025-05-07T19:44:04.1320302Z 2025-05-07T19:44:04.1320305Z 2025-05-07T19:44:04.1320309Z 2025-05-07T19:44:04.1320312Z 2025-05-07T19:44:04.1320316Z 2025-05-07T19:44:04.1320319Z 2025-05-07T19:44:04.1320323Z 2025-05-07T19:44:04.1320349Z 2025-05-07T19:44:04.1320353Z 2025-05-07T19:44:04.1320356Z 2025-05-07T19:44:04.1320360Z 2025-05-07T19:44:04.1530259Z typing-extensions-4. | 88 KB | ########## | 100%  2025-05-07T19:44:04.1530758Z 2025-05-07T19:44:04.1530899Z 2025-05-07T19:44:04.1530907Z 2025-05-07T19:44:04.1530912Z 2025-05-07T19:44:04.1530917Z 2025-05-07T19:44:04.1530921Z 2025-05-07T19:44:04.1530925Z 2025-05-07T19:44:04.1530961Z 2025-05-07T19:44:04.1530994Z 2025-05-07T19:44:04.1530998Z 2025-05-07T19:44:04.1531002Z 2025-05-07T19:44:04.1531006Z 2025-05-07T19:44:04.1531009Z 2025-05-07T19:44:04.1531013Z 2025-05-07T19:44:04.1531016Z 2025-05-07T19:44:04.1531380Z typing_extensions-4. | 51 KB | ########## | 100%  2025-05-07T19:44:04.1531717Z 2025-05-07T19:44:04.1531742Z 2025-05-07T19:44:04.1531745Z 2025-05-07T19:44:04.1531749Z 2025-05-07T19:44:04.1531752Z 2025-05-07T19:44:04.1531756Z 2025-05-07T19:44:04.1531759Z 2025-05-07T19:44:04.1531763Z 2025-05-07T19:44:04.1531766Z 2025-05-07T19:44:04.1531770Z 2025-05-07T19:44:04.1531778Z 2025-05-07T19:44:04.1531782Z 2025-05-07T19:44:04.1531785Z 2025-05-07T19:44:04.1531788Z 2025-05-07T19:44:04.1531792Z 2025-05-07T19:44:04.1699177Z typing_extensions-4. | 51 KB | ########## | 100%  2025-05-07T19:44:04.1702694Z python-3.12.2 | 30.8 MB | ######5 | 65% 2025-05-07T19:44:04.1702949Z 2025-05-07T19:44:04.1702954Z 2025-05-07T19:44:04.1702958Z 2025-05-07T19:44:04.1702961Z 2025-05-07T19:44:04.1702966Z 2025-05-07T19:44:04.1702970Z 2025-05-07T19:44:04.1702973Z 2025-05-07T19:44:04.1702977Z 2025-05-07T19:44:04.1703002Z 2025-05-07T19:44:04.1703164Z 2025-05-07T19:44:04.1703182Z 2025-05-07T19:44:04.1703186Z 2025-05-07T19:44:04.1703189Z 2025-05-07T19:44:04.1703216Z 2025-05-07T19:44:04.1703220Z 2025-05-07T19:44:04.1703223Z 2025-05-07T19:44:04.1703231Z 2025-05-07T19:44:04.1705075Z libuuid-2.38.1 | 33 KB | ########## | 100%  2025-05-07T19:44:04.1705397Z 2025-05-07T19:44:04.1705400Z 2025-05-07T19:44:04.1705404Z 2025-05-07T19:44:04.1705407Z 2025-05-07T19:44:04.1705437Z 2025-05-07T19:44:04.1705440Z 2025-05-07T19:44:04.1705444Z 2025-05-07T19:44:04.1705447Z 2025-05-07T19:44:04.1705450Z 2025-05-07T19:44:04.1705454Z 2025-05-07T19:44:04.1705457Z 2025-05-07T19:44:04.1705460Z 2025-05-07T19:44:04.1705464Z 2025-05-07T19:44:04.1705467Z 2025-05-07T19:44:04.1705471Z 2025-05-07T19:44:04.1705474Z 2025-05-07T19:44:04.1705477Z 2025-05-07T19:44:04.1794322Z libuuid-2.38.1 | 33 KB | ########## | 100%  2025-05-07T19:44:04.1794899Z 2025-05-07T19:44:04.1794903Z 2025-05-07T19:44:04.1794907Z 2025-05-07T19:44:04.1794911Z 2025-05-07T19:44:04.1794914Z 2025-05-07T19:44:04.1794917Z 2025-05-07T19:44:04.1794921Z 2025-05-07T19:44:04.1794924Z 2025-05-07T19:44:04.1794928Z 2025-05-07T19:44:04.1794931Z 2025-05-07T19:44:04.1794935Z 2025-05-07T19:44:04.1794938Z 2025-05-07T19:44:04.1794942Z 2025-05-07T19:44:04.1794945Z 2025-05-07T19:44:04.1794948Z 2025-05-07T19:44:04.1794959Z 2025-05-07T19:44:04.1795821Z libgcc-ng-15.1.0 | 34 KB | ########## | 100%  2025-05-07T19:44:04.1796138Z 2025-05-07T19:44:04.1796142Z 2025-05-07T19:44:04.1796145Z 2025-05-07T19:44:04.1796149Z 2025-05-07T19:44:04.1796152Z 2025-05-07T19:44:04.1796155Z 2025-05-07T19:44:04.1796159Z 2025-05-07T19:44:04.1796162Z 2025-05-07T19:44:04.1796165Z 2025-05-07T19:44:04.1796191Z 2025-05-07T19:44:04.1796194Z 2025-05-07T19:44:04.1796197Z 2025-05-07T19:44:04.1796201Z 2025-05-07T19:44:04.1796209Z 2025-05-07T19:44:04.1796217Z 2025-05-07T19:44:04.1796224Z 2025-05-07T19:44:04.1824307Z libgcc-ng-15.1.0 | 34 KB | ########## | 100%  2025-05-07T19:44:04.1824711Z 2025-05-07T19:44:04.1824715Z 2025-05-07T19:44:04.1824719Z 2025-05-07T19:44:04.1824724Z 2025-05-07T19:44:04.1824727Z 2025-05-07T19:44:04.1824731Z 2025-05-07T19:44:04.1824734Z 2025-05-07T19:44:04.1824738Z 2025-05-07T19:44:04.1824741Z 2025-05-07T19:44:04.1824744Z 2025-05-07T19:44:04.1824748Z 2025-05-07T19:44:04.1824751Z 2025-05-07T19:44:04.1824755Z 2025-05-07T19:44:04.1824758Z 2025-05-07T19:44:04.1824761Z 2025-05-07T19:44:04.1824765Z 2025-05-07T19:44:04.1824768Z 2025-05-07T19:44:04.1824772Z 2025-05-07T19:44:04.1824775Z 2025-05-07T19:44:04.1905119Z ... (more hidden) ... 2025-05-07T19:44:04.1905459Z 2025-05-07T19:44:04.1905464Z 2025-05-07T19:44:04.1905468Z 2025-05-07T19:44:04.1905472Z 2025-05-07T19:44:04.1905490Z 2025-05-07T19:44:04.1905504Z 2025-05-07T19:44:04.1905507Z 2025-05-07T19:44:04.1905511Z 2025-05-07T19:44:04.1905535Z 2025-05-07T19:44:04.1905539Z 2025-05-07T19:44:04.1905542Z 2025-05-07T19:44:04.1905546Z 2025-05-07T19:44:04.1905549Z 2025-05-07T19:44:04.1905553Z 2025-05-07T19:44:04.1905556Z 2025-05-07T19:44:04.1905559Z 2025-05-07T19:44:04.1905563Z 2025-05-07T19:44:04.1905566Z 2025-05-07T19:44:04.1905880Z libnsl-2.0.1 | 33 KB | ########## | 100%  2025-05-07T19:44:04.1906222Z 2025-05-07T19:44:04.1906226Z 2025-05-07T19:44:04.1906229Z 2025-05-07T19:44:04.1906233Z 2025-05-07T19:44:04.1906236Z 2025-05-07T19:44:04.1906240Z 2025-05-07T19:44:04.1906243Z 2025-05-07T19:44:04.1906246Z 2025-05-07T19:44:04.1906250Z 2025-05-07T19:44:04.1906253Z 2025-05-07T19:44:04.1906257Z 2025-05-07T19:44:04.1906260Z 2025-05-07T19:44:04.1906264Z 2025-05-07T19:44:04.1906267Z 2025-05-07T19:44:04.1906270Z 2025-05-07T19:44:04.1906274Z 2025-05-07T19:44:04.1906449Z 2025-05-07T19:44:04.1906455Z 2025-05-07T19:44:04.1949032Z libnsl-2.0.1 | 33 KB | ########## | 100%  2025-05-07T19:44:04.1949379Z 2025-05-07T19:44:04.1949389Z 2025-05-07T19:44:04.1949926Z cryptography-44.0.3 | 1.5 MB | ########## | 100%  2025-05-07T19:44:04.1950238Z 2025-05-07T19:44:04.1950246Z 2025-05-07T19:44:04.2744641Z cryptography-44.0.3 | 1.5 MB | ########## | 100%  2025-05-07T19:44:04.3674165Z python-3.12.2 | 30.8 MB | #########8 | 98% 2025-05-07T19:44:04.9842113Z python-3.12.2 | 30.8 MB | ########## | 100% 2025-05-07T19:44:04.9845653Z python-3.12.2 | 30.8 MB | ########## | 100% 2025-05-07T19:44:04.9845999Z 2025-05-07T19:44:04.9846029Z 2025-05-07T19:44:04.9846034Z 2025-05-07T19:44:04.9846037Z 2025-05-07T19:44:04.9846041Z 2025-05-07T19:44:04.9846046Z 2025-05-07T19:44:04.9846049Z 2025-05-07T19:44:04.9846055Z 2025-05-07T19:44:04.9846058Z 2025-05-07T19:44:04.9846391Z 2025-05-07T19:44:04.9846428Z 2025-05-07T19:44:04.9846435Z 2025-05-07T19:44:04.9846441Z 2025-05-07T19:44:04.9846447Z 2025-05-07T19:44:04.9846452Z 2025-05-07T19:44:04.9846458Z 2025-05-07T19:44:04.9846501Z 2025-05-07T19:44:04.9846507Z 2025-05-07T19:44:04.9846515Z 2025-05-07T19:44:04.9846656Z 2025-05-07T19:44:04.9847297Z  2025-05-07T19:44:04.9847782Z 2025-05-07T19:44:04.9848010Z 2025-05-07T19:44:04.9848239Z  2025-05-07T19:44:04.9848550Z 2025-05-07T19:44:04.9848554Z 2025-05-07T19:44:04.9848821Z  2025-05-07T19:44:04.9849053Z 2025-05-07T19:44:04.9849057Z 2025-05-07T19:44:04.9849060Z 2025-05-07T19:44:04.9849335Z  2025-05-07T19:44:04.9849574Z 2025-05-07T19:44:04.9849578Z 2025-05-07T19:44:04.9849589Z 2025-05-07T19:44:04.9849601Z 2025-05-07T19:44:04.9849791Z  2025-05-07T19:44:04.9850051Z 2025-05-07T19:44:04.9850055Z 2025-05-07T19:44:04.9850058Z 2025-05-07T19:44:04.9850062Z 2025-05-07T19:44:04.9850065Z 2025-05-07T19:44:04.9850257Z  2025-05-07T19:44:04.9850532Z 2025-05-07T19:44:04.9850535Z 2025-05-07T19:44:04.9850539Z 2025-05-07T19:44:04.9850542Z 2025-05-07T19:44:04.9850577Z 2025-05-07T19:44:04.9850580Z 2025-05-07T19:44:04.9850781Z  2025-05-07T19:44:04.9851017Z 2025-05-07T19:44:04.9851021Z 2025-05-07T19:44:04.9851024Z 2025-05-07T19:44:04.9851028Z 2025-05-07T19:44:04.9851031Z 2025-05-07T19:44:04.9851034Z 2025-05-07T19:44:04.9851038Z 2025-05-07T19:44:04.9851311Z  2025-05-07T19:44:04.9851556Z 2025-05-07T19:44:04.9851568Z 2025-05-07T19:44:04.9851571Z 2025-05-07T19:44:04.9851575Z 2025-05-07T19:44:04.9851578Z 2025-05-07T19:44:04.9851582Z 2025-05-07T19:44:04.9851585Z 2025-05-07T19:44:04.9851588Z 2025-05-07T19:44:04.9851824Z  2025-05-07T19:44:04.9852065Z 2025-05-07T19:44:04.9852068Z 2025-05-07T19:44:04.9852072Z 2025-05-07T19:44:04.9852075Z 2025-05-07T19:44:04.9852078Z 2025-05-07T19:44:04.9852082Z 2025-05-07T19:44:04.9852085Z 2025-05-07T19:44:04.9852088Z 2025-05-07T19:44:04.9852092Z 2025-05-07T19:44:04.9852325Z  2025-05-07T19:44:04.9852570Z 2025-05-07T19:44:04.9852573Z 2025-05-07T19:44:04.9852577Z 2025-05-07T19:44:04.9852580Z 2025-05-07T19:44:04.9852584Z 2025-05-07T19:44:04.9852587Z 2025-05-07T19:44:04.9852591Z 2025-05-07T19:44:04.9852594Z 2025-05-07T19:44:04.9852597Z 2025-05-07T19:44:04.9852601Z 2025-05-07T19:44:04.9852995Z  2025-05-07T19:44:04.9853308Z 2025-05-07T19:44:04.9853312Z 2025-05-07T19:44:04.9853316Z 2025-05-07T19:44:04.9853319Z 2025-05-07T19:44:04.9853322Z 2025-05-07T19:44:04.9853326Z 2025-05-07T19:44:04.9853329Z 2025-05-07T19:44:04.9853332Z 2025-05-07T19:44:04.9853336Z 2025-05-07T19:44:04.9853339Z 2025-05-07T19:44:04.9853343Z 2025-05-07T19:44:04.9853596Z  2025-05-07T19:44:04.9853902Z 2025-05-07T19:44:04.9853906Z 2025-05-07T19:44:04.9853909Z 2025-05-07T19:44:04.9853913Z 2025-05-07T19:44:04.9853916Z 2025-05-07T19:44:04.9853919Z 2025-05-07T19:44:04.9853923Z 2025-05-07T19:44:04.9853926Z 2025-05-07T19:44:04.9853930Z 2025-05-07T19:44:04.9853933Z 2025-05-07T19:44:04.9853936Z 2025-05-07T19:44:04.9853940Z 2025-05-07T19:44:04.9854198Z  2025-05-07T19:44:04.9854537Z 2025-05-07T19:44:04.9854544Z 2025-05-07T19:44:04.9854547Z 2025-05-07T19:44:04.9854551Z 2025-05-07T19:44:04.9854554Z 2025-05-07T19:44:04.9854558Z 2025-05-07T19:44:04.9854561Z 2025-05-07T19:44:04.9854564Z 2025-05-07T19:44:04.9854568Z 2025-05-07T19:44:04.9854571Z 2025-05-07T19:44:04.9854575Z 2025-05-07T19:44:04.9854605Z 2025-05-07T19:44:04.9854608Z 2025-05-07T19:44:04.9854827Z  2025-05-07T19:44:04.9855079Z 2025-05-07T19:44:04.9855083Z 2025-05-07T19:44:04.9855086Z 2025-05-07T19:44:04.9855089Z 2025-05-07T19:44:04.9855093Z 2025-05-07T19:44:04.9855096Z 2025-05-07T19:44:04.9855099Z 2025-05-07T19:44:04.9855103Z 2025-05-07T19:44:04.9855132Z 2025-05-07T19:44:04.9855135Z 2025-05-07T19:44:04.9855139Z 2025-05-07T19:44:04.9855142Z 2025-05-07T19:44:04.9855146Z 2025-05-07T19:44:04.9855149Z 2025-05-07T19:44:04.9855378Z  2025-05-07T19:44:04.9855641Z 2025-05-07T19:44:04.9855645Z 2025-05-07T19:44:04.9855648Z 2025-05-07T19:44:04.9855652Z 2025-05-07T19:44:04.9855680Z 2025-05-07T19:44:04.9855684Z 2025-05-07T19:44:04.9855687Z 2025-05-07T19:44:04.9855691Z 2025-05-07T19:44:04.9855694Z 2025-05-07T19:44:04.9855698Z 2025-05-07T19:44:04.9855701Z 2025-05-07T19:44:04.9855705Z 2025-05-07T19:44:04.9855708Z 2025-05-07T19:44:04.9855711Z 2025-05-07T19:44:04.9855715Z 2025-05-07T19:44:04.9855961Z  2025-05-07T19:44:04.9856244Z 2025-05-07T19:44:04.9856247Z 2025-05-07T19:44:04.9856251Z 2025-05-07T19:44:04.9856254Z 2025-05-07T19:44:04.9856258Z 2025-05-07T19:44:04.9856261Z 2025-05-07T19:44:04.9856265Z 2025-05-07T19:44:04.9856268Z 2025-05-07T19:44:04.9856271Z 2025-05-07T19:44:04.9856275Z 2025-05-07T19:44:04.9856278Z 2025-05-07T19:44:04.9856282Z 2025-05-07T19:44:04.9856285Z 2025-05-07T19:44:04.9856288Z 2025-05-07T19:44:04.9856296Z 2025-05-07T19:44:04.9856302Z 2025-05-07T19:44:04.9856561Z  2025-05-07T19:44:04.9856820Z 2025-05-07T19:44:04.9856823Z 2025-05-07T19:44:04.9856826Z 2025-05-07T19:44:04.9856830Z 2025-05-07T19:44:04.9856833Z 2025-05-07T19:44:04.9856837Z 2025-05-07T19:44:04.9856840Z 2025-05-07T19:44:04.9856844Z 2025-05-07T19:44:04.9856847Z 2025-05-07T19:44:04.9856851Z 2025-05-07T19:44:04.9856854Z 2025-05-07T19:44:04.9856858Z 2025-05-07T19:44:04.9856888Z 2025-05-07T19:44:04.9856892Z 2025-05-07T19:44:04.9856895Z 2025-05-07T19:44:04.9856898Z 2025-05-07T19:44:04.9856902Z 2025-05-07T19:44:04.9857142Z  2025-05-07T19:44:04.9857405Z 2025-05-07T19:44:04.9857408Z 2025-05-07T19:44:04.9857412Z 2025-05-07T19:44:04.9857416Z 2025-05-07T19:44:04.9857446Z 2025-05-07T19:44:04.9857450Z 2025-05-07T19:44:04.9857457Z 2025-05-07T19:44:04.9857519Z 2025-05-07T19:44:04.9857523Z 2025-05-07T19:44:04.9857527Z 2025-05-07T19:44:04.9857530Z 2025-05-07T19:44:04.9857533Z 2025-05-07T19:44:04.9857537Z 2025-05-07T19:44:04.9857540Z 2025-05-07T19:44:04.9857544Z 2025-05-07T19:44:04.9857547Z 2025-05-07T19:44:04.9857550Z 2025-05-07T19:44:04.9857554Z 2025-05-07T19:44:04.9857798Z  2025-05-07T19:44:04.9858091Z 2025-05-07T19:44:04.9858184Z done 2025-05-07T19:44:05.0861915Z Preparing transaction: \ done 2025-05-07T19:44:05.8602257Z Verifying transaction: / - \ | / - \ done 2025-05-07T19:44:07.3771348Z Executing transaction: / - \ | / - \ | / - \ | / - \ done 2025-05-07T19:44:07.5858010Z [SETUP] Testing pyOpenSSL import ... 2025-05-07T19:44:09.2865817Z [CHECK] Python (sub-)package 'OpenSSL' found ... 2025-05-07T19:44:09.2884697Z [SETUP] Installing libxcrypt ... 2025-05-07T19:44:09.2910171Z [EXEC] [ATTEMPT 0/3] + conda install -n build_binary -c conda-forge --override-channels -y libxcrypt 2025-05-07T19:44:09.9527471Z Channels: 2025-05-07T19:44:09.9528126Z - conda-forge 2025-05-07T19:44:09.9528377Z Platform: linux-64 2025-05-07T19:44:13.0625343Z Collecting package metadata (repodata.json): - \ | / done 2025-05-07T19:44:13.4985093Z Solving environment: \ done 2025-05-07T19:44:13.5213438Z 2025-05-07T19:44:13.5214250Z # All requested packages already installed. 2025-05-07T19:44:13.5214956Z 2025-05-07T19:44:16.8445185Z [SETUP] Copying over ... 2025-05-07T19:44:16.8446082Z + cp /github/home/miniconda/envs/build_binary/include/crypt.h /github/home/miniconda/envs/build_binary/include/python3.12/crypt.h 2025-05-07T19:44:16.8446683Z 2025-05-07T19:44:16.8475841Z 2025-05-07T19:44:18.4461036Z [SETUP] Installed Python version: Python 3.12.2 2025-05-07T19:44:18.4462325Z [SETUP] Successfully created Conda environment: build_binary 2025-05-07T19:44:18.4522474Z ##[group]Run . $PRELUDE; install_cxx_compiler $BUILD_ENV gcc 2025-05-07T19:44:18.4522980Z . $PRELUDE; install_cxx_compiler $BUILD_ENV gcc 2025-05-07T19:44:18.4523595Z shell: bash --noprofile --norc -e -o pipefail {0} 2025-05-07T19:44:18.4523959Z env: 2025-05-07T19:44:18.4524195Z PRELUDE: .github/scripts/setup_env.bash 2025-05-07T19:44:18.4524535Z BUILD_ENV: build_binary 2025-05-07T19:44:18.4524795Z BUILD_TARGET: genai 2025-05-07T19:44:18.4525066Z BUILD_VARIANT: cuda 2025-05-07T19:44:18.4525316Z BUILD_CUDA_VERSION: 12.8.0 2025-05-07T19:44:18.4525612Z ##[endgroup] 2025-05-07T19:44:18.9049714Z ################################################################################ 2025-05-07T19:44:18.9050953Z # Install C/C++ Compilers 2025-05-07T19:44:18.9051259Z # 2025-05-07T19:44:18.9065402Z # [2025-05-07T19:44:18.905Z] + install_cxx_compiler build_binary gcc 2025-05-07T19:44:18.9066713Z ################################################################################ 2025-05-07T19:44:18.9067550Z 2025-05-07T19:44:18.9082796Z [EXEC] [ATTEMPT 0/3] + wget -q --timeout 1 pypi.org -O /dev/null 2025-05-07T19:44:18.9902428Z [CHECK] Network does not appear to be blocked. 2025-05-07T19:44:18.9918778Z [INSTALL] Installing GLIBC (architecture = 64) ... 2025-05-07T19:44:18.9943515Z [EXEC] [ATTEMPT 0/3] + conda install -n build_binary -c conda-forge --override-channels -y sysroot_linux-64=2.17 2025-05-07T19:44:19.6668108Z Channels: 2025-05-07T19:44:19.6668568Z - conda-forge 2025-05-07T19:44:19.6668853Z Platform: linux-64 2025-05-07T19:44:22.7625654Z Collecting package metadata (repodata.json): - \ | / done 2025-05-07T19:44:23.1918464Z Solving environment: \ done 2025-05-07T19:44:23.2427058Z 2025-05-07T19:44:23.2427558Z ## Package Plan ## 2025-05-07T19:44:23.2428030Z 2025-05-07T19:44:23.2428550Z environment location: /github/home/miniconda/envs/build_binary 2025-05-07T19:44:23.2428916Z 2025-05-07T19:44:23.2429028Z added / updated specs: 2025-05-07T19:44:23.2429341Z - sysroot_linux-64=2.17 2025-05-07T19:44:23.2429579Z 2025-05-07T19:44:23.2429583Z 2025-05-07T19:44:23.2429724Z The following packages will be downloaded: 2025-05-07T19:44:23.2429957Z 2025-05-07T19:44:23.2430112Z package | build 2025-05-07T19:44:23.2430585Z ---------------------------|----------------- 2025-05-07T19:44:23.2431049Z kernel-headers_linux-64-3.10.0| he073ed8_18 921 KB conda-forge 2025-05-07T19:44:23.2431567Z sysroot_linux-64-2.17 | h0157908_18 14.5 MB conda-forge 2025-05-07T19:44:23.2432141Z ------------------------------------------------------------ 2025-05-07T19:44:23.2432488Z Total: 15.4 MB 2025-05-07T19:44:23.2432803Z 2025-05-07T19:44:23.2433109Z The following NEW packages will be INSTALLED: 2025-05-07T19:44:23.2433379Z 2025-05-07T19:44:23.2433922Z kernel-headers_li~ conda-forge/noarch::kernel-headers_linux-64-3.10.0-he073ed8_18 2025-05-07T19:44:23.2434546Z sysroot_linux-64 conda-forge/noarch::sysroot_linux-64-2.17-h0157908_18 2025-05-07T19:44:23.2435181Z 2025-05-07T19:44:23.2435186Z 2025-05-07T19:44:23.2435190Z 2025-05-07T19:44:23.2435348Z Downloading and Extracting Packages: ...working... 2025-05-07T19:44:23.2435776Z sysroot_linux-64-2.1 | 14.5 MB | | 0% 2025-05-07T19:44:23.2436029Z 2025-05-07T19:44:23.4813822Z kernel-headers_linux | 921 KB | | 0%  2025-05-07T19:44:23.4838590Z sysroot_linux-64-2.1 | 14.5 MB | | 0% 2025-05-07T19:44:23.4838888Z 2025-05-07T19:44:23.5066977Z kernel-headers_linux | 921 KB | 1 | 2%  2025-05-07T19:44:23.5067298Z 2025-05-07T19:44:23.5814365Z kernel-headers_linux | 921 KB | ########## | 100%  2025-05-07T19:44:23.6490955Z sysroot_linux-64-2.1 | 14.5 MB | #########2 | 92% 2025-05-07T19:44:23.6956997Z sysroot_linux-64-2.1 | 14.5 MB | ########## | 100% 2025-05-07T19:44:23.6957971Z 2025-05-07T19:44:23.6959343Z kernel-headers_linux | 921 KB | ########## | 100%  2025-05-07T19:44:23.6960273Z 2025-05-07T19:44:24.1215723Z kernel-headers_linux | 921 KB | ########## | 100%  2025-05-07T19:44:24.1217039Z sysroot_linux-64-2.1 | 14.5 MB | ########## | 100% 2025-05-07T19:44:24.1218090Z 2025-05-07T19:44:24.1218313Z 2025-05-07T19:44:24.1218532Z  done 2025-05-07T19:44:24.2225563Z Preparing transaction: / done 2025-05-07T19:44:24.4234340Z Verifying transaction: \ | done 2025-05-07T19:44:24.5244648Z Executing transaction: - done 2025-05-07T19:44:24.6102173Z [CHECK] LD_LIBRARY_PATH = 2025-05-07T19:44:24.6103205Z [CHECK] CONDA_PREFIX is not set. 2025-05-07T19:44:26.2844790Z [CHECK] libstdc++.so.6 found in CONDA_PREFIX PATH (symbolic link): /github/home/miniconda/envs/build_binary/lib/libstdc++.so.6 2025-05-07T19:44:26.2860511Z [INSTALL] Installing GCC (11.4.0, 64) through Conda ... 2025-05-07T19:44:26.2888878Z [EXEC] [ATTEMPT 0/3] + conda install -n build_binary -c conda-forge --override-channels -y gxx_linux-64=11.4.0 2025-05-07T19:44:26.9785516Z Channels: 2025-05-07T19:44:26.9785877Z - conda-forge 2025-05-07T19:44:26.9786188Z Platform: linux-64 2025-05-07T19:44:30.1058482Z Collecting package metadata (repodata.json): - \ | / done 2025-05-07T19:44:31.2524530Z Solving environment: \ | / done 2025-05-07T19:44:31.3025116Z 2025-05-07T19:44:31.3025581Z ## Package Plan ## 2025-05-07T19:44:31.3025813Z 2025-05-07T19:44:31.3026077Z environment location: /github/home/miniconda/envs/build_binary 2025-05-07T19:44:31.3026414Z 2025-05-07T19:44:31.3026529Z added / updated specs: 2025-05-07T19:44:31.3026849Z - gxx_linux-64=11.4.0 2025-05-07T19:44:31.3027026Z 2025-05-07T19:44:31.3027032Z 2025-05-07T19:44:31.3027168Z The following packages will be downloaded: 2025-05-07T19:44:31.3027427Z 2025-05-07T19:44:31.3027556Z package | build 2025-05-07T19:44:31.3027986Z ---------------------------|----------------- 2025-05-07T19:44:31.3028495Z binutils_impl_linux-64-2.40| ha1999f0_7 6.0 MB conda-forge 2025-05-07T19:44:31.3029049Z binutils_linux-64-2.40 | hb3c18ed_4 28 KB conda-forge 2025-05-07T19:44:31.3029592Z gcc_impl_linux-64-11.4.0 | h00c12a0_13 53.0 MB conda-forge 2025-05-07T19:44:31.3030073Z gcc_linux-64-11.4.0 | ha077dfb_4 31 KB conda-forge 2025-05-07T19:44:31.3030580Z gxx_impl_linux-64-11.4.0 | h634f3ee_13 11.2 MB conda-forge 2025-05-07T19:44:31.3031056Z gxx_linux-64-11.4.0 | h35bfe5d_4 29 KB conda-forge 2025-05-07T19:44:31.3031559Z ld_impl_linux-64-2.40 | hf3520f5_7 691 KB conda-forge 2025-05-07T19:44:31.3032068Z libgcc-devel_linux-64-11.4.0| h8f596e0_113 2.3 MB conda-forge 2025-05-07T19:44:31.3032611Z libsanitizer-11.4.0 | h5763a12_13 3.5 MB conda-forge 2025-05-07T19:44:31.3033126Z libstdcxx-15.1.0 | h8f9b012_2 3.7 MB conda-forge 2025-05-07T19:44:31.3034038Z libstdcxx-devel_linux-64-11.4.0| h8f596e0_113 11.1 MB conda-forge 2025-05-07T19:44:31.3034603Z libstdcxx-ng-15.1.0 | h4852527_2 34 KB conda-forge 2025-05-07T19:44:31.3035048Z ------------------------------------------------------------ 2025-05-07T19:44:31.3035459Z Total: 91.6 MB 2025-05-07T19:44:31.3035693Z 2025-05-07T19:44:31.3035868Z The following NEW packages will be INSTALLED: 2025-05-07T19:44:31.3036108Z 2025-05-07T19:44:31.3036420Z binutils_impl_lin~ conda-forge/linux-64::binutils_impl_linux-64-2.40-ha1999f0_7 2025-05-07T19:44:31.3037060Z binutils_linux-64 conda-forge/linux-64::binutils_linux-64-2.40-hb3c18ed_4 2025-05-07T19:44:31.3037647Z gcc_impl_linux-64 conda-forge/linux-64::gcc_impl_linux-64-11.4.0-h00c12a0_13 2025-05-07T19:44:31.3038385Z gcc_linux-64 conda-forge/linux-64::gcc_linux-64-11.4.0-ha077dfb_4 2025-05-07T19:44:31.3038977Z gxx_impl_linux-64 conda-forge/linux-64::gxx_impl_linux-64-11.4.0-h634f3ee_13 2025-05-07T19:44:31.3039537Z gxx_linux-64 conda-forge/linux-64::gxx_linux-64-11.4.0-h35bfe5d_4 2025-05-07T19:44:31.3040154Z libgcc-devel_linu~ conda-forge/noarch::libgcc-devel_linux-64-11.4.0-h8f596e0_113 2025-05-07T19:44:31.3040769Z libsanitizer conda-forge/linux-64::libsanitizer-11.4.0-h5763a12_13 2025-05-07T19:44:31.3041344Z libstdcxx conda-forge/linux-64::libstdcxx-15.1.0-h8f9b012_2 2025-05-07T19:44:31.3041973Z libstdcxx-devel_l~ conda-forge/noarch::libstdcxx-devel_linux-64-11.4.0-h8f596e0_113 2025-05-07T19:44:31.3042365Z 2025-05-07T19:44:31.3042497Z The following packages will be UPDATED: 2025-05-07T19:44:31.3042726Z 2025-05-07T19:44:31.3043096Z ld_impl_linux-64 pkgs/main::ld_impl_linux-64-2.40-h12e~ --> conda-forge::ld_impl_linux-64-2.40-hf3520f5_7 2025-05-07T19:44:31.3043875Z libstdcxx-ng pkgs/main::libstdcxx-ng-11.2.0-h12345~ --> conda-forge::libstdcxx-ng-15.1.0-h4852527_2 2025-05-07T19:44:31.3044362Z 2025-05-07T19:44:31.3044366Z 2025-05-07T19:44:31.3044370Z 2025-05-07T19:44:31.3044531Z Downloading and Extracting Packages: ...working... 2025-05-07T19:44:31.3044966Z gcc_impl_linux-64-11 | 53.0 MB | | 0% 2025-05-07T19:44:31.3045221Z 2025-05-07T19:44:31.3045694Z gxx_impl_linux-64-11 | 11.2 MB | | 0%  2025-05-07T19:44:31.3045990Z 2025-05-07T19:44:31.3045994Z 2025-05-07T19:44:31.3049721Z libstdcxx-devel_linu | 11.1 MB | | 0%  2025-05-07T19:44:31.3050004Z 2025-05-07T19:44:31.3050009Z 2025-05-07T19:44:31.3050012Z 2025-05-07T19:44:31.3058097Z binutils_impl_linux- | 6.0 MB | | 0%  2025-05-07T19:44:31.3058424Z 2025-05-07T19:44:31.3058427Z 2025-05-07T19:44:31.3058431Z 2025-05-07T19:44:31.3058434Z 2025-05-07T19:44:31.3108859Z libstdcxx-15.1.0 | 3.7 MB | | 0%  2025-05-07T19:44:31.3109184Z 2025-05-07T19:44:31.3109204Z 2025-05-07T19:44:31.3109216Z 2025-05-07T19:44:31.3109220Z 2025-05-07T19:44:31.3109223Z 2025-05-07T19:44:31.3110011Z libsanitizer-11.4.0 | 3.5 MB | | 0%  2025-05-07T19:44:31.3110353Z 2025-05-07T19:44:31.3110359Z 2025-05-07T19:44:31.3110363Z 2025-05-07T19:44:31.3110367Z 2025-05-07T19:44:31.3110371Z 2025-05-07T19:44:31.3110418Z 2025-05-07T19:44:31.3110716Z libgcc-devel_linux-6 | 2.3 MB | | 0%  2025-05-07T19:44:31.3111032Z 2025-05-07T19:44:31.3111036Z 2025-05-07T19:44:31.3111040Z 2025-05-07T19:44:31.3111044Z 2025-05-07T19:44:31.3111048Z 2025-05-07T19:44:31.3111052Z 2025-05-07T19:44:31.3111149Z 2025-05-07T19:44:31.3116329Z ld_impl_linux-64-2.4 | 691 KB | | 0%  2025-05-07T19:44:31.3117328Z 2025-05-07T19:44:31.3117347Z 2025-05-07T19:44:31.3117362Z 2025-05-07T19:44:31.3117377Z 2025-05-07T19:44:31.3117391Z 2025-05-07T19:44:31.3117406Z 2025-05-07T19:44:31.3117421Z 2025-05-07T19:44:31.3117508Z 2025-05-07T19:44:31.3118932Z libstdcxx-ng-15.1.0 | 34 KB | | 0%  2025-05-07T19:44:31.3119820Z 2025-05-07T19:44:31.3119824Z 2025-05-07T19:44:31.3119827Z 2025-05-07T19:44:31.3119831Z 2025-05-07T19:44:31.3119834Z 2025-05-07T19:44:31.3119837Z 2025-05-07T19:44:31.3119841Z 2025-05-07T19:44:31.3119876Z 2025-05-07T19:44:31.3119880Z 2025-05-07T19:44:31.3120169Z gcc_linux-64-11.4.0 | 31 KB | | 0%  2025-05-07T19:44:31.3120462Z 2025-05-07T19:44:31.3120465Z 2025-05-07T19:44:31.3120469Z 2025-05-07T19:44:31.3120472Z 2025-05-07T19:44:31.3120476Z 2025-05-07T19:44:31.3120479Z 2025-05-07T19:44:31.3120483Z 2025-05-07T19:44:31.3120486Z 2025-05-07T19:44:31.3120490Z 2025-05-07T19:44:31.3120493Z 2025-05-07T19:44:31.3120777Z gxx_linux-64-11.4.0 | 29 KB | | 0%  2025-05-07T19:44:31.3121067Z 2025-05-07T19:44:31.3121071Z 2025-05-07T19:44:31.3121074Z 2025-05-07T19:44:31.3121242Z 2025-05-07T19:44:31.3121247Z 2025-05-07T19:44:31.3121258Z 2025-05-07T19:44:31.3121262Z 2025-05-07T19:44:31.3121265Z 2025-05-07T19:44:31.3121269Z 2025-05-07T19:44:31.3121272Z 2025-05-07T19:44:31.3121276Z 2025-05-07T19:44:31.4690837Z binutils_linux-64-2. | 28 KB | | 0%  2025-05-07T19:44:31.4691623Z 2025-05-07T19:44:31.4692206Z gxx_impl_linux-64-11 | 11.2 MB | | 0%  2025-05-07T19:44:31.4692502Z 2025-05-07T19:44:31.4692507Z 2025-05-07T19:44:31.4692510Z 2025-05-07T19:44:31.4692514Z 2025-05-07T19:44:31.4712554Z libstdcxx-15.1.0 | 3.7 MB | | 0%  2025-05-07T19:44:31.4712891Z 2025-05-07T19:44:31.4712896Z 2025-05-07T19:44:31.4719288Z 2025-05-07T19:44:31.5226392Z binutils_impl_linux- | 6.0 MB | | 0%  2025-05-07T19:44:31.5363716Z gcc_impl_linux-64-11 | 53.0 MB | | 0% 2025-05-07T19:44:31.5364000Z 2025-05-07T19:44:31.5364006Z 2025-05-07T19:44:31.5902754Z libstdcxx-devel_linu | 11.1 MB | | 0%  2025-05-07T19:44:31.5904554Z 2025-05-07T19:44:31.5904568Z 2025-05-07T19:44:31.5904575Z 2025-05-07T19:44:31.5904581Z 2025-05-07T19:44:31.5905383Z libstdcxx-15.1.0 | 3.7 MB | ########## | 100%  2025-05-07T19:44:31.5905716Z 2025-05-07T19:44:31.5905722Z 2025-05-07T19:44:31.5905728Z 2025-05-07T19:44:31.5905733Z 2025-05-07T19:44:31.6092971Z libstdcxx-15.1.0 | 3.7 MB | ########## | 100%  2025-05-07T19:44:31.6093304Z 2025-05-07T19:44:31.6093312Z 2025-05-07T19:44:31.6093317Z 2025-05-07T19:44:31.6093595Z binutils_impl_linux- | 6.0 MB | ########## | 100%  2025-05-07T19:44:31.6093891Z 2025-05-07T19:44:31.6093896Z 2025-05-07T19:44:31.6093901Z 2025-05-07T19:44:31.6226309Z binutils_impl_linux- | 6.0 MB | ########## | 100%  2025-05-07T19:44:31.6364840Z gcc_impl_linux-64-11 | 53.0 MB | #2 | 12% 2025-05-07T19:44:31.6365355Z 2025-05-07T19:44:31.6365614Z 2025-05-07T19:44:31.6545568Z libstdcxx-devel_linu | 11.1 MB | ######8 | 69%  2025-05-07T19:44:31.6545920Z 2025-05-07T19:44:31.6545925Z 2025-05-07T19:44:31.6545929Z 2025-05-07T19:44:31.6545932Z 2025-05-07T19:44:31.6545936Z 2025-05-07T19:44:31.6625400Z libsanitizer-11.4.0 | 3.5 MB | | 0%  2025-05-07T19:44:31.6625757Z 2025-05-07T19:44:31.6625992Z gxx_impl_linux-64-11 | 11.2 MB | ########## | 100%  2025-05-07T19:44:31.6626255Z 2025-05-07T19:44:31.6856832Z gxx_impl_linux-64-11 | 11.2 MB | ########## | 100%  2025-05-07T19:44:31.6857159Z 2025-05-07T19:44:31.6857165Z 2025-05-07T19:44:31.6857170Z 2025-05-07T19:44:31.6857173Z 2025-05-07T19:44:31.6857177Z 2025-05-07T19:44:31.6857181Z 2025-05-07T19:44:31.7099806Z libgcc-devel_linux-6 | 2.3 MB | | 1%  2025-05-07T19:44:31.7100167Z 2025-05-07T19:44:31.7100174Z 2025-05-07T19:44:31.7100178Z 2025-05-07T19:44:31.7100183Z 2025-05-07T19:44:31.7100187Z 2025-05-07T19:44:31.7100192Z 2025-05-07T19:44:31.7100197Z 2025-05-07T19:44:31.7227936Z ld_impl_linux-64-2.4 | 691 KB | 2 | 2%  2025-05-07T19:44:31.7331644Z gcc_impl_linux-64-11 | 53.0 MB | ##4 | 25% 2025-05-07T19:44:31.7332018Z 2025-05-07T19:44:31.7332026Z 2025-05-07T19:44:31.7332033Z 2025-05-07T19:44:31.7332040Z 2025-05-07T19:44:31.7332060Z 2025-05-07T19:44:31.7332066Z 2025-05-07T19:44:31.7332074Z 2025-05-07T19:44:31.7454762Z ld_impl_linux-64-2.4 | 691 KB | ########## | 100%  2025-05-07T19:44:31.7455136Z 2025-05-07T19:44:31.7455143Z 2025-05-07T19:44:31.7455150Z 2025-05-07T19:44:31.7455155Z 2025-05-07T19:44:31.7455161Z 2025-05-07T19:44:31.7455166Z 2025-05-07T19:44:31.7562676Z libgcc-devel_linux-6 | 2.3 MB | ########## | 100%  2025-05-07T19:44:31.7563063Z 2025-05-07T19:44:31.7563069Z 2025-05-07T19:44:31.7563074Z 2025-05-07T19:44:31.7563078Z 2025-05-07T19:44:31.7563083Z 2025-05-07T19:44:31.7686771Z libsanitizer-11.4.0 | 3.5 MB | ####6 | 46%  2025-05-07T19:44:31.7687144Z 2025-05-07T19:44:31.7687175Z 2025-05-07T19:44:31.7687179Z 2025-05-07T19:44:31.7687211Z 2025-05-07T19:44:31.7725782Z libstdcxx-15.1.0 | 3.7 MB | ########## | 100%  2025-05-07T19:44:31.7726106Z 2025-05-07T19:44:31.7726111Z 2025-05-07T19:44:31.7726115Z 2025-05-07T19:44:31.7726119Z 2025-05-07T19:44:31.7726123Z 2025-05-07T19:44:31.7726126Z 2025-05-07T19:44:31.7726130Z 2025-05-07T19:44:31.7726154Z 2025-05-07T19:44:31.7746173Z libstdcxx-ng-15.1.0 | 34 KB | ####7 | 47%  2025-05-07T19:44:31.7746515Z 2025-05-07T19:44:31.7746520Z 2025-05-07T19:44:31.7746523Z 2025-05-07T19:44:31.7746527Z 2025-05-07T19:44:31.7746531Z 2025-05-07T19:44:31.7746534Z 2025-05-07T19:44:31.7746538Z 2025-05-07T19:44:31.7746569Z 2025-05-07T19:44:31.8163329Z libstdcxx-ng-15.1.0 | 34 KB | ########## | 100%  2025-05-07T19:44:31.8163668Z 2025-05-07T19:44:31.8163964Z 2025-05-07T19:44:31.8163979Z 2025-05-07T19:44:31.8164048Z 2025-05-07T19:44:31.8164083Z 2025-05-07T19:44:31.8164088Z 2025-05-07T19:44:31.8164101Z 2025-05-07T19:44:31.8164109Z 2025-05-07T19:44:31.8164114Z 2025-05-07T19:44:31.8164118Z 2025-05-07T19:44:31.8176367Z gxx_linux-64-11.4.0 | 29 KB | #####5 | 55%  2025-05-07T19:44:31.8176707Z 2025-05-07T19:44:31.8176715Z 2025-05-07T19:44:31.8176720Z 2025-05-07T19:44:31.8176724Z 2025-05-07T19:44:31.8176743Z 2025-05-07T19:44:31.8176747Z 2025-05-07T19:44:31.8176753Z 2025-05-07T19:44:31.8176758Z 2025-05-07T19:44:31.8176761Z 2025-05-07T19:44:31.8176767Z 2025-05-07T19:44:31.8205373Z gxx_linux-64-11.4.0 | 29 KB | ########## | 100%  2025-05-07T19:44:31.8205702Z 2025-05-07T19:44:31.8205707Z 2025-05-07T19:44:31.8205713Z 2025-05-07T19:44:31.8205741Z 2025-05-07T19:44:31.8205748Z 2025-05-07T19:44:31.8206240Z 2025-05-07T19:44:31.8206251Z 2025-05-07T19:44:31.8206581Z ld_impl_linux-64-2.4 | 691 KB | ########## | 100%  2025-05-07T19:44:31.8206876Z 2025-05-07T19:44:31.8206904Z 2025-05-07T19:44:31.8206908Z 2025-05-07T19:44:31.8206918Z 2025-05-07T19:44:31.8206921Z 2025-05-07T19:44:31.8206925Z 2025-05-07T19:44:31.8206941Z 2025-05-07T19:44:31.8227717Z ld_impl_linux-64-2.4 | 691 KB | ########## | 100%  2025-05-07T19:44:31.8280374Z gcc_impl_linux-64-11 | 53.0 MB | ###5 | 35% 2025-05-07T19:44:31.8280754Z 2025-05-07T19:44:31.8280821Z 2025-05-07T19:44:31.8280834Z 2025-05-07T19:44:31.8280837Z 2025-05-07T19:44:31.8280850Z 2025-05-07T19:44:31.8280861Z 2025-05-07T19:44:31.8280873Z 2025-05-07T19:44:31.8280877Z 2025-05-07T19:44:31.8280889Z 2025-05-07T19:44:31.8295526Z gcc_linux-64-11.4.0 | 31 KB | #####2 | 52%  2025-05-07T19:44:31.8295899Z 2025-05-07T19:44:31.8296174Z 2025-05-07T19:44:31.8296186Z 2025-05-07T19:44:31.8296198Z 2025-05-07T19:44:31.8296202Z 2025-05-07T19:44:31.8296216Z 2025-05-07T19:44:31.8296219Z 2025-05-07T19:44:31.8296249Z 2025-05-07T19:44:31.8296269Z 2025-05-07T19:44:31.8398696Z gcc_linux-64-11.4.0 | 31 KB | ########## | 100%  2025-05-07T19:44:31.8399601Z 2025-05-07T19:44:31.8399615Z 2025-05-07T19:44:31.8576336Z libstdcxx-devel_linu | 11.1 MB | ########## | 100%  2025-05-07T19:44:31.8576657Z 2025-05-07T19:44:31.8576772Z 2025-05-07T19:44:31.8576781Z 2025-05-07T19:44:31.8576786Z 2025-05-07T19:44:31.8576790Z 2025-05-07T19:44:31.8577363Z libsanitizer-11.4.0 | 3.5 MB | ########## | 100%  2025-05-07T19:44:31.8577711Z 2025-05-07T19:44:31.8577730Z 2025-05-07T19:44:31.8577733Z 2025-05-07T19:44:31.8577737Z 2025-05-07T19:44:31.8577740Z 2025-05-07T19:44:31.8578578Z libsanitizer-11.4.0 | 3.5 MB | ########## | 100%  2025-05-07T19:44:31.8578879Z 2025-05-07T19:44:31.8578883Z 2025-05-07T19:44:31.8578887Z 2025-05-07T19:44:31.8578890Z 2025-05-07T19:44:31.8578908Z 2025-05-07T19:44:31.8578912Z 2025-05-07T19:44:31.8578915Z 2025-05-07T19:44:31.8578919Z 2025-05-07T19:44:31.8579170Z 2025-05-07T19:44:31.8579183Z 2025-05-07T19:44:31.8579187Z 2025-05-07T19:44:31.8597828Z binutils_linux-64-2. | 28 KB | #####6 | 56%  2025-05-07T19:44:31.8598202Z 2025-05-07T19:44:31.8598221Z 2025-05-07T19:44:31.8598225Z 2025-05-07T19:44:31.8598229Z 2025-05-07T19:44:31.8598233Z 2025-05-07T19:44:31.8598236Z 2025-05-07T19:44:31.8598240Z 2025-05-07T19:44:31.8598243Z 2025-05-07T19:44:31.8598247Z 2025-05-07T19:44:31.8598250Z 2025-05-07T19:44:31.8598254Z 2025-05-07T19:44:31.9204134Z binutils_linux-64-2. | 28 KB | ########## | 100%  2025-05-07T19:44:31.9204497Z 2025-05-07T19:44:31.9204502Z 2025-05-07T19:44:31.9204505Z 2025-05-07T19:44:31.9204509Z 2025-05-07T19:44:31.9204512Z 2025-05-07T19:44:31.9204516Z 2025-05-07T19:44:31.9204788Z libgcc-devel_linux-6 | 2.3 MB | ########## | 100%  2025-05-07T19:44:31.9205085Z 2025-05-07T19:44:31.9205103Z 2025-05-07T19:44:31.9205106Z 2025-05-07T19:44:31.9205363Z binutils_impl_linux- | 6.0 MB | ########## | 100%  2025-05-07T19:44:31.9205642Z 2025-05-07T19:44:31.9205646Z 2025-05-07T19:44:31.9205649Z 2025-05-07T19:44:31.9205653Z 2025-05-07T19:44:31.9205656Z 2025-05-07T19:44:31.9205660Z 2025-05-07T19:44:31.9228162Z libgcc-devel_linux-6 | 2.3 MB | ########## | 100%  2025-05-07T19:44:31.9512500Z gcc_impl_linux-64-11 | 53.0 MB | ######2 | 62% 2025-05-07T19:44:31.9512807Z 2025-05-07T19:44:31.9512812Z 2025-05-07T19:44:31.9512816Z 2025-05-07T19:44:31.9512819Z 2025-05-07T19:44:31.9512823Z 2025-05-07T19:44:31.9512826Z 2025-05-07T19:44:31.9512830Z 2025-05-07T19:44:31.9512833Z 2025-05-07T19:44:31.9512837Z 2025-05-07T19:44:31.9512840Z 2025-05-07T19:44:31.9513176Z gxx_linux-64-11.4.0 | 29 KB | ########## | 100%  2025-05-07T19:44:31.9513489Z 2025-05-07T19:44:31.9513493Z 2025-05-07T19:44:31.9513497Z 2025-05-07T19:44:31.9513500Z 2025-05-07T19:44:31.9513504Z 2025-05-07T19:44:31.9513521Z 2025-05-07T19:44:31.9513530Z 2025-05-07T19:44:31.9513534Z 2025-05-07T19:44:31.9513537Z 2025-05-07T19:44:31.9513541Z 2025-05-07T19:44:31.9565520Z gxx_linux-64-11.4.0 | 29 KB | ########## | 100%  2025-05-07T19:44:31.9566478Z 2025-05-07T19:44:31.9566492Z 2025-05-07T19:44:31.9566504Z 2025-05-07T19:44:31.9566515Z 2025-05-07T19:44:31.9566525Z 2025-05-07T19:44:31.9566536Z 2025-05-07T19:44:31.9566547Z 2025-05-07T19:44:31.9566557Z 2025-05-07T19:44:31.9567361Z libstdcxx-ng-15.1.0 | 34 KB | ########## | 100%  2025-05-07T19:44:31.9568217Z 2025-05-07T19:44:31.9568228Z 2025-05-07T19:44:31.9568238Z 2025-05-07T19:44:31.9568248Z 2025-05-07T19:44:31.9568259Z 2025-05-07T19:44:31.9568269Z 2025-05-07T19:44:31.9568279Z 2025-05-07T19:44:31.9568289Z 2025-05-07T19:44:31.9607938Z libstdcxx-ng-15.1.0 | 34 KB | ########## | 100%  2025-05-07T19:44:31.9609320Z 2025-05-07T19:44:31.9836555Z gxx_impl_linux-64-11 | 11.2 MB | ########## | 100%  2025-05-07T19:44:31.9837062Z 2025-05-07T19:44:31.9837067Z 2025-05-07T19:44:31.9837071Z 2025-05-07T19:44:31.9837074Z 2025-05-07T19:44:31.9837078Z 2025-05-07T19:44:31.9837081Z 2025-05-07T19:44:31.9837085Z 2025-05-07T19:44:31.9837088Z 2025-05-07T19:44:31.9837092Z 2025-05-07T19:44:31.9837370Z gcc_linux-64-11.4.0 | 31 KB | ########## | 100%  2025-05-07T19:44:31.9837678Z 2025-05-07T19:44:31.9837682Z 2025-05-07T19:44:31.9837685Z 2025-05-07T19:44:31.9837689Z 2025-05-07T19:44:31.9837692Z 2025-05-07T19:44:31.9837695Z 2025-05-07T19:44:31.9837698Z 2025-05-07T19:44:31.9837702Z 2025-05-07T19:44:31.9837709Z 2025-05-07T19:44:32.0148546Z gcc_linux-64-11.4.0 | 31 KB | ########## | 100%  2025-05-07T19:44:32.0149463Z 2025-05-07T19:44:32.0149477Z 2025-05-07T19:44:32.0149488Z 2025-05-07T19:44:32.0149498Z 2025-05-07T19:44:32.0149509Z 2025-05-07T19:44:32.0149519Z 2025-05-07T19:44:32.0149529Z 2025-05-07T19:44:32.0149941Z 2025-05-07T19:44:32.0149974Z 2025-05-07T19:44:32.0149985Z 2025-05-07T19:44:32.0149995Z 2025-05-07T19:44:32.0150939Z binutils_linux-64-2. | 28 KB | ########## | 100%  2025-05-07T19:44:32.0151869Z 2025-05-07T19:44:32.0151880Z 2025-05-07T19:44:32.0151891Z 2025-05-07T19:44:32.0151920Z 2025-05-07T19:44:32.0151931Z 2025-05-07T19:44:32.0151941Z 2025-05-07T19:44:32.0151951Z 2025-05-07T19:44:32.0151961Z 2025-05-07T19:44:32.0151972Z 2025-05-07T19:44:32.0151983Z 2025-05-07T19:44:32.0151993Z 2025-05-07T19:44:32.0231318Z binutils_linux-64-2. | 28 KB | ########## | 100%  2025-05-07T19:44:32.0981889Z gcc_impl_linux-64-11 | 53.0 MB | #######9 | 79% 2025-05-07T19:44:32.0982233Z 2025-05-07T19:44:32.0982617Z 2025-05-07T19:44:32.0982626Z 2025-05-07T19:44:32.0982663Z 2025-05-07T19:44:32.0982668Z 2025-05-07T19:44:32.2619292Z libsanitizer-11.4.0 | 3.5 MB | ########## | 100%  2025-05-07T19:44:32.2620647Z gcc_impl_linux-64-11 | 53.0 MB | ########## | 100% 2025-05-07T19:44:32.3322237Z gcc_impl_linux-64-11 | 53.0 MB | ########## | 100% 2025-05-07T19:44:32.3322792Z 2025-05-07T19:44:32.3322809Z 2025-05-07T19:44:32.8262993Z libstdcxx-devel_linu | 11.1 MB | ########## | 100%  2025-05-07T19:44:32.8266576Z gcc_impl_linux-64-11 | 53.0 MB | ########## | 100% 2025-05-07T19:44:32.8266933Z 2025-05-07T19:44:32.8267149Z 2025-05-07T19:44:32.8267416Z  2025-05-07T19:44:32.8267636Z 2025-05-07T19:44:32.8267641Z 2025-05-07T19:44:32.8267840Z  2025-05-07T19:44:32.8268065Z 2025-05-07T19:44:32.8268069Z 2025-05-07T19:44:32.8268074Z 2025-05-07T19:44:32.8268256Z  2025-05-07T19:44:32.8268492Z 2025-05-07T19:44:32.8268496Z 2025-05-07T19:44:32.8268500Z 2025-05-07T19:44:32.8268504Z 2025-05-07T19:44:32.8268702Z  2025-05-07T19:44:32.8268937Z 2025-05-07T19:44:32.8268941Z 2025-05-07T19:44:32.8268944Z 2025-05-07T19:44:32.8268947Z 2025-05-07T19:44:32.8268952Z 2025-05-07T19:44:32.8269150Z  2025-05-07T19:44:32.8269377Z 2025-05-07T19:44:32.8269380Z 2025-05-07T19:44:32.8269383Z 2025-05-07T19:44:32.8269387Z 2025-05-07T19:44:32.8269390Z 2025-05-07T19:44:32.8269394Z 2025-05-07T19:44:32.8269605Z  2025-05-07T19:44:32.8269837Z 2025-05-07T19:44:32.8269840Z 2025-05-07T19:44:32.8269844Z 2025-05-07T19:44:32.8269847Z 2025-05-07T19:44:32.8269851Z 2025-05-07T19:44:32.8269854Z 2025-05-07T19:44:32.8269858Z 2025-05-07T19:44:32.8270044Z  2025-05-07T19:44:32.8270294Z 2025-05-07T19:44:32.8270298Z 2025-05-07T19:44:32.8270305Z 2025-05-07T19:44:32.8270309Z 2025-05-07T19:44:32.8271650Z 2025-05-07T19:44:32.8271654Z 2025-05-07T19:44:32.8271689Z 2025-05-07T19:44:32.8271692Z 2025-05-07T19:44:32.8271935Z  2025-05-07T19:44:32.8272171Z 2025-05-07T19:44:32.8272174Z 2025-05-07T19:44:32.8272178Z 2025-05-07T19:44:32.8272181Z 2025-05-07T19:44:32.8272185Z 2025-05-07T19:44:32.8272188Z 2025-05-07T19:44:32.8272192Z 2025-05-07T19:44:32.8272195Z 2025-05-07T19:44:32.8272199Z 2025-05-07T19:44:32.8272428Z  2025-05-07T19:44:32.8272659Z 2025-05-07T19:44:32.8272662Z 2025-05-07T19:44:32.8272686Z 2025-05-07T19:44:32.8272690Z 2025-05-07T19:44:32.8272693Z 2025-05-07T19:44:32.8272697Z 2025-05-07T19:44:32.8272700Z 2025-05-07T19:44:32.8272704Z 2025-05-07T19:44:32.8272707Z 2025-05-07T19:44:32.8272711Z 2025-05-07T19:44:32.8273022Z  2025-05-07T19:44:32.8273272Z 2025-05-07T19:44:32.8273275Z 2025-05-07T19:44:32.8273299Z 2025-05-07T19:44:32.8273302Z 2025-05-07T19:44:32.8273305Z 2025-05-07T19:44:32.8273309Z 2025-05-07T19:44:32.8273312Z 2025-05-07T19:44:32.8273316Z 2025-05-07T19:44:32.8273319Z 2025-05-07T19:44:32.8273322Z 2025-05-07T19:44:32.8273326Z 2025-05-07T19:44:32.8273549Z  done 2025-05-07T19:44:32.9279917Z Preparing transaction: \ done 2025-05-07T19:44:33.2307020Z Verifying transaction: / - \ done 2025-05-07T19:44:33.3323038Z Executing transaction: / done 2025-05-07T19:44:33.4229006Z [INSTALL] Setting the C/C++ compiler symlinks ... 2025-05-07T19:44:37.1542737Z + ln -sf /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-cc /github/home/miniconda/envs/build_binary/bin/cc 2025-05-07T19:44:37.1543504Z 2025-05-07T19:44:37.1559500Z 2025-05-07T19:44:37.1579655Z + ln -sf /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-cc /github/home/miniconda/envs/build_binary/bin/gcc 2025-05-07T19:44:37.1580277Z 2025-05-07T19:44:37.1592272Z 2025-05-07T19:44:37.1611412Z + ln -sf /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ /github/home/miniconda/envs/build_binary/bin/c++ 2025-05-07T19:44:37.1611988Z 2025-05-07T19:44:37.1622897Z 2025-05-07T19:44:37.1642595Z + ln -sf /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ /github/home/miniconda/envs/build_binary/bin/g++ 2025-05-07T19:44:37.1643219Z 2025-05-07T19:44:37.1653524Z 2025-05-07T19:44:38.9771781Z /github/home/miniconda/envs/build_binary/bin/cc 2025-05-07T19:44:38.9772144Z 2025-05-07T19:44:39.0345807Z [CHECK] Binary cc found in PATH 2025-05-07T19:44:40.8244716Z /github/home/miniconda/envs/build_binary/bin/gcc 2025-05-07T19:44:40.8245536Z 2025-05-07T19:44:40.8821697Z [CHECK] Binary gcc found in PATH 2025-05-07T19:44:42.6826886Z /github/home/miniconda/envs/build_binary/bin/c++ 2025-05-07T19:44:42.6827743Z 2025-05-07T19:44:42.7402487Z [CHECK] Binary c++ found in PATH 2025-05-07T19:44:44.5213805Z /github/home/miniconda/envs/build_binary/bin/g++ 2025-05-07T19:44:44.5214421Z 2025-05-07T19:44:44.5802828Z [CHECK] Binary g++ found in PATH 2025-05-07T19:44:44.5803582Z [INFO] Printing out all preprocessor defines in the C compiler ... 2025-05-07T19:44:44.5805237Z + conda run -n build_binary cc -dM -E - 2025-05-07T19:44:44.5805464Z 2025-05-07T19:44:46.3732875Z #define __DBL_MIN_EXP__ (-1021) 2025-05-07T19:44:46.3733384Z #define __UINT_LEAST16_MAX__ 0xffff 2025-05-07T19:44:46.3734092Z #define __ATOMIC_ACQUIRE 2 2025-05-07T19:44:46.3734369Z #define __FLT128_MAX_10_EXP__ 4932 2025-05-07T19:44:46.3734932Z #define __FLT_MIN__ 1.17549435082228750796873653722224568e-38F 2025-05-07T19:44:46.3735326Z #define __GCC_IEC_559_COMPLEX 2 2025-05-07T19:44:46.3735647Z #define __UINT_LEAST8_TYPE__ unsigned char 2025-05-07T19:44:46.3735991Z #define __SIZEOF_FLOAT80__ 16 2025-05-07T19:44:46.3736289Z #define __INTMAX_C(c) c ## L 2025-05-07T19:44:46.3736897Z #define __CHAR_BIT__ 8 2025-05-07T19:44:46.3737149Z #define __UINT8_MAX__ 0xff 2025-05-07T19:44:46.3737426Z #define __SCHAR_WIDTH__ 8 2025-05-07T19:44:46.3737688Z #define __WINT_MAX__ 0xffffffffU 2025-05-07T19:44:46.3737996Z #define __FLT32_MIN_EXP__ (-125) 2025-05-07T19:44:46.3738288Z #define __ORDER_LITTLE_ENDIAN__ 1234 2025-05-07T19:44:46.3738618Z #define __SIZE_MAX__ 0xffffffffffffffffUL 2025-05-07T19:44:46.3738938Z #define __WCHAR_MAX__ 0x7fffffff 2025-05-07T19:44:46.3739255Z #define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_1 1 2025-05-07T19:44:46.3739611Z #define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_2 1 2025-05-07T19:44:46.3739944Z #define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_4 1 2025-05-07T19:44:46.3740386Z #define __DBL_DENORM_MIN__ ((double)4.94065645841246544176568792868221372e-324L) 2025-05-07T19:44:46.3740821Z #define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_8 1 2025-05-07T19:44:46.3741162Z #define __GCC_ATOMIC_CHAR_LOCK_FREE 2 2025-05-07T19:44:46.3741607Z #define __GCC_IEC_559 2 2025-05-07T19:44:46.3741890Z #define __FLT32X_DECIMAL_DIG__ 17 2025-05-07T19:44:46.3742182Z #define __FLT_EVAL_METHOD__ 0 2025-05-07T19:44:46.3742477Z #define __FLT64_DECIMAL_DIG__ 17 2025-05-07T19:44:46.3742789Z #define __GCC_ATOMIC_CHAR32_T_LOCK_FREE 2 2025-05-07T19:44:46.3743134Z #define __UINT_FAST64_MAX__ 0xffffffffffffffffUL 2025-05-07T19:44:46.3743624Z #define __SIG_ATOMIC_TYPE__ int 2025-05-07T19:44:46.3743905Z #define __DBL_MIN_10_EXP__ (-307) 2025-05-07T19:44:46.3744217Z #define __FINITE_MATH_ONLY__ 0 2025-05-07T19:44:46.3744492Z #define __FLT32X_MAX_EXP__ 1024 2025-05-07T19:44:46.3744794Z #define __FLT32_HAS_DENORM__ 1 2025-05-07T19:44:46.3745098Z #define __UINT_FAST8_MAX__ 0xff 2025-05-07T19:44:46.3745388Z #define __FLT32_MAX_10_EXP__ 38 2025-05-07T19:44:46.3745659Z #define __DEC64_MAX_EXP__ 385 2025-05-07T19:44:46.3745938Z #define __INT8_C(c) c 2025-05-07T19:44:46.3746200Z #define __INT_LEAST8_WIDTH__ 8 2025-05-07T19:44:46.3746507Z #define __UINT_LEAST64_MAX__ 0xffffffffffffffffUL 2025-05-07T19:44:46.3746864Z #define __SHRT_MAX__ 0x7fff 2025-05-07T19:44:46.3747193Z #define __LDBL_MAX__ 1.18973149535723176502126385303097021e+4932L 2025-05-07T19:44:46.3747579Z #define __FLT64X_MAX_10_EXP__ 4932 2025-05-07T19:44:46.3747865Z #define __LDBL_IS_IEC_60559__ 2 2025-05-07T19:44:46.3748159Z #define __FLT64X_HAS_QUIET_NAN__ 1 2025-05-07T19:44:46.3748450Z #define __UINT_LEAST8_MAX__ 0xff 2025-05-07T19:44:46.3748756Z #define __GCC_ATOMIC_BOOL_LOCK_FREE 2 2025-05-07T19:44:46.3749166Z #define __FLT128_DENORM_MIN__ 6.47517511943802511092443895822764655e-4966F128 2025-05-07T19:44:46.3749620Z #define __UINTMAX_TYPE__ long unsigned int 2025-05-07T19:44:46.3749942Z #define __linux 1 2025-05-07T19:44:46.3750177Z #define __DEC32_EPSILON__ 1E-6DF 2025-05-07T19:44:46.3750591Z #define __FLT_EVAL_METHOD_TS_18661_3__ 0 2025-05-07T19:44:46.3750865Z #define __unix 1 2025-05-07T19:44:46.3751104Z #define __UINT32_MAX__ 0xffffffffU 2025-05-07T19:44:46.3751374Z #define __FLT128_MIN_EXP__ (-16381) 2025-05-07T19:44:46.3751659Z #define __WINT_MIN__ 0U 2025-05-07T19:44:46.3751902Z #define __FLT128_MIN_10_EXP__ (-4931) 2025-05-07T19:44:46.3752192Z #define __FLT32X_IS_IEC_60559__ 2 2025-05-07T19:44:46.3752475Z #define __INT_LEAST16_WIDTH__ 16 2025-05-07T19:44:46.3752734Z #define __SCHAR_MAX__ 0x7f 2025-05-07T19:44:46.3752998Z #define __FLT128_MANT_DIG__ 113 2025-05-07T19:44:46.3753275Z #define __WCHAR_MIN__ (-__WCHAR_MAX__ - 1) 2025-05-07T19:44:46.3753584Z #define __INT64_C(c) c ## L 2025-05-07T19:44:46.3753843Z #define __GCC_ATOMIC_POINTER_LOCK_FREE 2 2025-05-07T19:44:46.3754156Z #define __FLT32X_MANT_DIG__ 53 2025-05-07T19:44:46.3754414Z #define __USER_LABEL_PREFIX__ 2025-05-07T19:44:46.3754777Z #define __FLT64X_EPSILON__ 1.08420217248550443400745280086994171e-19F64x 2025-05-07T19:44:46.3755150Z #define __STDC_HOSTED__ 1 2025-05-07T19:44:46.3755422Z #define __DEC64_MIN_EXP__ (-382) 2025-05-07T19:44:46.3755703Z #define __DBL_DIG__ 15 2025-05-07T19:44:46.3755924Z #define __FLT32_DIG__ 6 2025-05-07T19:44:46.3756237Z #define __FLT_EPSILON__ 1.19209289550781250000000000000000000e-7F 2025-05-07T19:44:46.3758838Z #define __SHRT_WIDTH__ 16 2025-05-07T19:44:46.3759099Z #define __FLT32_IS_IEC_60559__ 2 2025-05-07T19:44:46.3759414Z #define __LDBL_MIN__ 3.36210314311209350626267781732175260e-4932L 2025-05-07T19:44:46.3759767Z #define __STDC_UTF_16__ 1 2025-05-07T19:44:46.3760006Z #define __DBL_IS_IEC_60559__ 2 2025-05-07T19:44:46.3760280Z #define __DEC32_MAX__ 9.999999E96DF 2025-05-07T19:44:46.3760778Z #define __FLT64X_DENORM_MIN__ 3.64519953188247460252840593361941982e-4951F64x 2025-05-07T19:44:46.3761171Z #define __FLT32X_HAS_INFINITY__ 1 2025-05-07T19:44:46.3761455Z #define __INT32_MAX__ 0x7fffffff 2025-05-07T19:44:46.3761706Z #define __unix__ 1 2025-05-07T19:44:46.3761939Z #define __INT_WIDTH__ 32 2025-05-07T19:44:46.3762176Z #define __SIZEOF_LONG__ 8 2025-05-07T19:44:46.3762433Z #define __STDC_IEC_559__ 1 2025-05-07T19:44:46.3762678Z #define __STDC_ISO_10646__ 201103L 2025-05-07T19:44:46.3762959Z #define __UINT16_C(c) c 2025-05-07T19:44:46.3763286Z #define __DECIMAL_DIG__ 21 2025-05-07T19:44:46.3763565Z #define __STDC_IEC_559_COMPLEX__ 1 2025-05-07T19:44:46.3763927Z #define __FLT64_EPSILON__ 2.22044604925031308084726333618164062e-16F64 2025-05-07T19:44:46.3764287Z #define __gnu_linux__ 1 2025-05-07T19:44:46.3764535Z #define __FLT128_IS_IEC_60559__ 2 2025-05-07T19:44:46.3783734Z #define __FLT64X_MIN_10_EXP__ (-4931) 2025-05-07T19:44:46.3784266Z #define __LDBL_HAS_QUIET_NAN__ 1 2025-05-07T19:44:46.3784599Z #define __FLT64_MANT_DIG__ 53 2025-05-07T19:44:46.3784880Z #define __FLT64X_MANT_DIG__ 64 2025-05-07T19:44:46.3785163Z #define __GNUC__ 11 2025-05-07T19:44:46.3785406Z #define __pie__ 2 2025-05-07T19:44:46.3785628Z #define __MMX__ 1 2025-05-07T19:44:46.3785885Z #define __FLT_HAS_DENORM__ 1 2025-05-07T19:44:46.3786169Z #define __SIZEOF_LONG_DOUBLE__ 16 2025-05-07T19:44:46.3786485Z #define __BIGGEST_ALIGNMENT__ 16 2025-05-07T19:44:46.3786772Z #define __FLT64_MAX_10_EXP__ 308 2025-05-07T19:44:46.3787175Z #define __DBL_MAX__ ((double)1.79769313486231570814527423731704357e+308L) 2025-05-07T19:44:46.3787615Z #define __INT_FAST32_MAX__ 0x7fffffffffffffffL 2025-05-07T19:44:46.3787973Z #define __DBL_HAS_INFINITY__ 1 2025-05-07T19:44:46.3788248Z #define __SIZEOF_FLOAT__ 4 2025-05-07T19:44:46.3788544Z #define __HAVE_SPECULATION_SAFE_VALUE 1 2025-05-07T19:44:46.3788878Z #define __DEC32_MIN_EXP__ (-94) 2025-05-07T19:44:46.3789155Z #define __INTPTR_WIDTH__ 64 2025-05-07T19:44:46.3789454Z #define __FLT64X_HAS_INFINITY__ 1 2025-05-07T19:44:46.3789757Z #define __UINT_LEAST32_MAX__ 0xffffffffU 2025-05-07T19:44:46.3790086Z #define __FLT32X_HAS_DENORM__ 1 2025-05-07T19:44:46.3790370Z #define __INT_FAST16_TYPE__ long int 2025-05-07T19:44:46.3790686Z #define __MMX_WITH_SSE__ 1 2025-05-07T19:44:46.3790955Z #define __LDBL_HAS_DENORM__ 1 2025-05-07T19:44:46.3791247Z #define __FLT128_HAS_INFINITY__ 1 2025-05-07T19:44:46.3791548Z #define __DEC32_MIN__ 1E-95DF 2025-05-07T19:44:46.3791816Z #define __DBL_MAX_EXP__ 1024 2025-05-07T19:44:46.3792103Z #define __WCHAR_WIDTH__ 32 2025-05-07T19:44:46.3792442Z #define __FLT32_MAX__ 3.40282346638528859811704183484516925e+38F32 2025-05-07T19:44:46.3792841Z #define __DEC128_EPSILON__ 1E-33DL 2025-05-07T19:44:46.3793125Z #define __SSE2_MATH__ 1 2025-05-07T19:44:46.3793398Z #define __ATOMIC_HLE_RELEASE 131072 2025-05-07T19:44:46.3793709Z #define __PTRDIFF_MAX__ 0x7fffffffffffffffL 2025-05-07T19:44:46.3794034Z #define __amd64 1 2025-05-07T19:44:46.3794265Z #define __STDC_NO_THREADS__ 1 2025-05-07T19:44:46.3794561Z #define __ATOMIC_HLE_ACQUIRE 65536 2025-05-07T19:44:46.3794896Z #define __LONG_LONG_MAX__ 0x7fffffffffffffffLL 2025-05-07T19:44:46.3795224Z #define __SIZEOF_SIZE_T__ 8 2025-05-07T19:44:46.3795514Z #define __FLT64X_MIN_EXP__ (-16381) 2025-05-07T19:44:46.3795803Z #define __SIZEOF_WINT_T__ 4 2025-05-07T19:44:46.3796091Z #define __LONG_LONG_WIDTH__ 64 2025-05-07T19:44:46.3796484Z #define __FLT32_MAX_EXP__ 128 2025-05-07T19:44:46.3796768Z #define __GXX_ABI_VERSION 1016 2025-05-07T19:44:46.3797037Z #define __FLT_MIN_EXP__ (-125) 2025-05-07T19:44:46.3797331Z #define __GCC_HAVE_DWARF2_CFI_ASM 1 2025-05-07T19:44:46.3797795Z #define __INT16_MAX__ 0x7fff 2025-05-07T19:44:46.3798068Z #define __x86_64 1 2025-05-07T19:44:46.3798326Z #define __INT_FAST64_TYPE__ long int 2025-05-07T19:44:46.3798710Z #define __FLT64_DENORM_MIN__ 4.94065645841246544176568792868221372e-324F64 2025-05-07T19:44:46.3799206Z #define __DBL_MIN__ ((double)2.22507385850720138309023271733240406e-308L) 2025-05-07T19:44:46.3799686Z #define __FLT128_EPSILON__ 1.92592994438723585305597794258492732e-34F128 2025-05-07T19:44:46.3800192Z #define __FLT64X_NORM_MAX__ 1.18973149535723176502126385303097021e+4932F64x 2025-05-07T19:44:46.3800594Z #define __SIZEOF_POINTER__ 8 2025-05-07T19:44:46.3800984Z #define __LP64__ 1 2025-05-07T19:44:46.3801208Z #define __DBL_HAS_QUIET_NAN__ 1 2025-05-07T19:44:46.3801569Z #define __FLT32X_EPSILON__ 2.22044604925031308084726333618164062e-16F32x 2025-05-07T19:44:46.3801960Z #define __DECIMAL_BID_FORMAT__ 1 2025-05-07T19:44:46.3802230Z #define __FLT64_MIN_EXP__ (-1021) 2025-05-07T19:44:46.3802610Z #define __FLT64_MIN_10_EXP__ (-307) 2025-05-07T19:44:46.3802894Z #define __FLT64X_DECIMAL_DIG__ 21 2025-05-07T19:44:46.3803184Z #define __DEC128_MIN__ 1E-6143DL 2025-05-07T19:44:46.3803451Z #define __REGISTER_PREFIX__ 2025-05-07T19:44:46.3803725Z #define __UINT16_MAX__ 0xffff 2025-05-07T19:44:46.3803980Z #define __DBL_HAS_DENORM__ 1 2025-05-07T19:44:46.3804258Z #define __LDBL_HAS_INFINITY__ 1 2025-05-07T19:44:46.3804581Z #define __FLT32_MIN__ 1.17549435082228750796873653722224568e-38F32 2025-05-07T19:44:46.3804960Z #define __UINT8_TYPE__ unsigned char 2025-05-07T19:44:46.3805251Z #define __FLT_DIG__ 6 2025-05-07T19:44:46.3805480Z #define __NO_INLINE__ 1 2025-05-07T19:44:46.3805741Z #define __DEC_EVAL_METHOD__ 2 2025-05-07T19:44:46.3806060Z #define __DEC128_MAX__ 9.999999999999999999999999999999999E6144DL 2025-05-07T19:44:46.3806431Z #define __FLT_MANT_DIG__ 24 2025-05-07T19:44:46.3806683Z #define __LDBL_DECIMAL_DIG__ 21 2025-05-07T19:44:46.3806960Z #define __VERSION__ "11.4.0" 2025-05-07T19:44:46.3807213Z #define __UINT64_C(c) c ## UL 2025-05-07T19:44:46.3807480Z #define _STDC_PREDEF_H 1 2025-05-07T19:44:46.3807748Z #define __INT_LEAST32_MAX__ 0x7fffffff 2025-05-07T19:44:46.3808035Z #define __GCC_ATOMIC_INT_LOCK_FREE 2 2025-05-07T19:44:46.3808327Z #define __FLT128_MAX_EXP__ 16384 2025-05-07T19:44:46.3808585Z #define __FLT32_MANT_DIG__ 24 2025-05-07T19:44:46.3809307Z #define __FLOAT_WORD_ORDER__ __ORDER_LITTLE_ENDIAN__ 2025-05-07T19:44:46.3809652Z #define __FLT128_HAS_DENORM__ 1 2025-05-07T19:44:46.3809953Z #define __FLT32_DECIMAL_DIG__ 9 2025-05-07T19:44:46.3810221Z #define __FLT128_DIG__ 33 2025-05-07T19:44:46.3810491Z #define __INT32_C(c) c 2025-05-07T19:44:46.3810741Z #define __DEC64_EPSILON__ 1E-15DD 2025-05-07T19:44:46.3811053Z #define __ORDER_PDP_ENDIAN__ 3412 2025-05-07T19:44:46.3811362Z #define __DEC128_MIN_EXP__ (-6142) 2025-05-07T19:44:46.3811656Z #define __INT_FAST32_TYPE__ long int 2025-05-07T19:44:46.3812002Z #define __UINT_LEAST16_TYPE__ short unsigned int 2025-05-07T19:44:46.3812338Z #define unix 1 2025-05-07T19:44:46.3812600Z #define __SIZE_TYPE__ long unsigned int 2025-05-07T19:44:46.3812923Z #define __UINT64_MAX__ 0xffffffffffffffffUL 2025-05-07T19:44:46.3813272Z #define __FLT_IS_IEC_60559__ 2 2025-05-07T19:44:46.3813623Z #define __GNUC_WIDE_EXECUTION_CHARSET_NAME "UTF-32LE" 2025-05-07T19:44:46.3814026Z #define __FLT64X_DIG__ 18 2025-05-07T19:44:46.3814315Z #define __INT8_TYPE__ signed char 2025-05-07T19:44:46.3814635Z #define __ELF__ 1 2025-05-07T19:44:46.3814926Z #define __GCC_ASM_FLAG_OUTPUTS__ 1 2025-05-07T19:44:46.3815243Z #define __UINT32_TYPE__ unsigned int 2025-05-07T19:44:46.3815575Z #define __FLT_RADIX__ 2 2025-05-07T19:44:46.3815856Z #define __INT_LEAST16_TYPE__ short int 2025-05-07T19:44:46.3816284Z #define __LDBL_EPSILON__ 1.08420217248550443400745280086994171e-19L 2025-05-07T19:44:46.3816687Z #define __UINTMAX_C(c) c ## UL 2025-05-07T19:44:46.3816997Z #define __SSE_MATH__ 1 2025-05-07T19:44:46.3817257Z #define __k8 1 2025-05-07T19:44:46.3817623Z #define __FLT32X_MIN__ 2.22507385850720138309023271733240406e-308F32x 2025-05-07T19:44:46.3818271Z #define __SIG_ATOMIC_MAX__ 0x7fffffff 2025-05-07T19:44:46.3818601Z #define __GCC_ATOMIC_WCHAR_T_LOCK_FREE 2 2025-05-07T19:44:46.3818964Z #define __SIZEOF_PTRDIFF_T__ 8 2025-05-07T19:44:46.3819259Z #define __LDBL_DIG__ 18 2025-05-07T19:44:46.3819565Z #define __FLT64_IS_IEC_60559__ 2 2025-05-07T19:44:46.3819855Z #define __x86_64__ 1 2025-05-07T19:44:46.3820151Z #define __FLT32X_MIN_EXP__ (-1021) 2025-05-07T19:44:46.3820485Z #define __DEC32_SUBNORMAL_MIN__ 0.000001E-95DF 2025-05-07T19:44:46.3820888Z #define __INT_FAST16_MAX__ 0x7fffffffffffffffL 2025-05-07T19:44:46.3821363Z #define __FLT64_DIG__ 15 2025-05-07T19:44:46.3821684Z #define __UINT_FAST32_MAX__ 0xffffffffffffffffUL 2025-05-07T19:44:46.3822076Z #define __UINT_LEAST64_TYPE__ long unsigned int 2025-05-07T19:44:46.3822400Z #define __FLT_HAS_QUIET_NAN__ 1 2025-05-07T19:44:46.3822682Z #define __FLT_MAX_10_EXP__ 38 2025-05-07T19:44:46.3823057Z #define __LONG_MAX__ 0x7fffffffffffffffL 2025-05-07T19:44:46.3823471Z #define __FLT64X_HAS_DENORM__ 1 2025-05-07T19:44:46.3824017Z #define __DEC128_SUBNORMAL_MIN__ 0.000000000000000000000000000000001E-6143DL 2025-05-07T19:44:46.3824454Z #define __FLT_HAS_INFINITY__ 1 2025-05-07T19:44:46.3824840Z #define __GNUC_EXECUTION_CHARSET_NAME "UTF-8" 2025-05-07T19:44:46.3825209Z #define __UINT_FAST16_TYPE__ long unsigned int 2025-05-07T19:44:46.3825563Z #define __DEC64_MAX__ 9.999999999999999E384DD 2025-05-07T19:44:46.3825873Z #define __INT_FAST32_WIDTH__ 64 2025-05-07T19:44:46.3826181Z #define __CHAR16_TYPE__ short unsigned int 2025-05-07T19:44:46.3826502Z #define __PRAGMA_REDEFINE_EXTNAME 1 2025-05-07T19:44:46.3826837Z #define __SIZE_WIDTH__ 64 2025-05-07T19:44:46.3827084Z #define __SEG_FS 1 2025-05-07T19:44:46.3827368Z #define __INT_LEAST16_MAX__ 0x7fff 2025-05-07T19:44:46.3827673Z #define __DEC64_MANT_DIG__ 16 2025-05-07T19:44:46.3828000Z #define __INT64_MAX__ 0x7fffffffffffffffL 2025-05-07T19:44:46.3828324Z #define __SEG_GS 1 2025-05-07T19:44:46.3828701Z #define __FLT32_DENORM_MIN__ 1.40129846432481707092372958328991613e-45F32 2025-05-07T19:44:46.3829144Z #define __SIG_ATOMIC_WIDTH__ 32 2025-05-07T19:44:46.3829447Z #define __INT_LEAST64_TYPE__ long int 2025-05-07T19:44:46.3829794Z #define __INT16_TYPE__ short int 2025-05-07T19:44:46.3830103Z #define __INT_LEAST8_TYPE__ signed char 2025-05-07T19:44:46.3830461Z #define __STDC_VERSION__ 201710L 2025-05-07T19:44:46.3830763Z #define __SIZEOF_INT__ 4 2025-05-07T19:44:46.3831065Z #define __DEC32_MAX_EXP__ 97 2025-05-07T19:44:46.3831363Z #define __INT_FAST8_MAX__ 0x7f 2025-05-07T19:44:46.3831742Z #define __FLT128_MAX__ 1.18973149535723176508575932662800702e+4932F128 2025-05-07T19:44:46.3832152Z #define __INTPTR_MAX__ 0x7fffffffffffffffL 2025-05-07T19:44:46.3832474Z #define linux 1 2025-05-07T19:44:46.3832722Z #define __FLT64_HAS_QUIET_NAN__ 1 2025-05-07T19:44:46.3833000Z #define __FLT32_MIN_10_EXP__ (-37) 2025-05-07T19:44:46.3833299Z #define __FLT32X_DIG__ 15 2025-05-07T19:44:46.3833564Z #define __PTRDIFF_WIDTH__ 64 2025-05-07T19:44:46.3833852Z #define __LDBL_MANT_DIG__ 64 2025-05-07T19:44:46.3834127Z #define __FLT64_HAS_INFINITY__ 1 2025-05-07T19:44:46.3834506Z #define __FLT64X_MAX__ 1.18973149535723176502126385303097021e+4932F64x 2025-05-07T19:44:46.3834945Z #define __SIG_ATOMIC_MIN__ (-__SIG_ATOMIC_MAX__ - 1) 2025-05-07T19:44:46.3835312Z #define __code_model_small__ 1 2025-05-07T19:44:46.3835615Z #define __GCC_ATOMIC_LONG_LOCK_FREE 2 2025-05-07T19:44:46.3835913Z #define __DEC32_MANT_DIG__ 7 2025-05-07T19:44:46.3836310Z #define __k8__ 1 2025-05-07T19:44:46.3836552Z #define __INTPTR_TYPE__ long int 2025-05-07T19:44:46.3836880Z #define __UINT16_TYPE__ short unsigned int 2025-05-07T19:44:46.3837192Z #define __WCHAR_TYPE__ int 2025-05-07T19:44:46.3837473Z #define __pic__ 2 2025-05-07T19:44:46.3837726Z #define __UINTPTR_MAX__ 0xffffffffffffffffUL 2025-05-07T19:44:46.3838077Z #define __INT_FAST64_WIDTH__ 64 2025-05-07T19:44:46.3838384Z #define __INT_FAST64_MAX__ 0x7fffffffffffffffL 2025-05-07T19:44:46.3838767Z #define __GCC_ATOMIC_TEST_AND_SET_TRUEVAL 1 2025-05-07T19:44:46.3839276Z #define __FLT_NORM_MAX__ 3.40282346638528859811704183484516925e+38F 2025-05-07T19:44:46.3839648Z #define __FLT32_HAS_INFINITY__ 1 2025-05-07T19:44:46.3839970Z #define __FLT64X_MAX_EXP__ 16384 2025-05-07T19:44:46.3840278Z #define __UINT_FAST64_TYPE__ long unsigned int 2025-05-07T19:44:46.3840629Z #define __INT_MAX__ 0x7fffffff 2025-05-07T19:44:46.3840900Z #define __linux__ 1 2025-05-07T19:44:46.3841169Z #define __INT64_TYPE__ long int 2025-05-07T19:44:46.3841443Z #define __FLT_MAX_EXP__ 128 2025-05-07T19:44:46.3841728Z #define __ORDER_BIG_ENDIAN__ 4321 2025-05-07T19:44:46.3842003Z #define __DBL_MANT_DIG__ 53 2025-05-07T19:44:46.3842273Z #define __SIZEOF_FLOAT128__ 16 2025-05-07T19:44:46.3842558Z #define __INT_LEAST64_MAX__ 0x7fffffffffffffffL 2025-05-07T19:44:46.3842895Z #define __GCC_ATOMIC_CHAR16_T_LOCK_FREE 2 2025-05-07T19:44:46.3843184Z #define __DEC64_MIN__ 1E-383DD 2025-05-07T19:44:46.3843457Z #define __WINT_TYPE__ unsigned int 2025-05-07T19:44:46.3843830Z #define __UINT_LEAST32_TYPE__ unsigned int 2025-05-07T19:44:46.3844149Z #define __SIZEOF_SHORT__ 2 2025-05-07T19:44:46.3844464Z #define __FLT32_NORM_MAX__ 3.40282346638528859811704183484516925e+38F32 2025-05-07T19:44:46.3844828Z #define __SSE__ 1 2025-05-07T19:44:46.3845066Z #define __LDBL_MIN_EXP__ (-16381) 2025-05-07T19:44:46.3845392Z #define __FLT64_MAX__ 1.79769313486231570814527423731704357e+308F64 2025-05-07T19:44:46.3845745Z #define __amd64__ 1 2025-05-07T19:44:46.3845962Z #define __WINT_WIDTH__ 32 2025-05-07T19:44:46.3846219Z #define __INT_LEAST8_MAX__ 0x7f 2025-05-07T19:44:46.3846479Z #define __INT_LEAST64_WIDTH__ 64 2025-05-07T19:44:46.3846757Z #define __LDBL_MAX_EXP__ 16384 2025-05-07T19:44:46.3847016Z #define __FLT32X_MAX_10_EXP__ 308 2025-05-07T19:44:46.3847295Z #define __SIZEOF_INT128__ 16 2025-05-07T19:44:46.3847551Z #define __FLT64X_IS_IEC_60559__ 2 2025-05-07T19:44:46.3847827Z #define __LDBL_MAX_10_EXP__ 4932 2025-05-07T19:44:46.3848101Z #define __ATOMIC_RELAXED 0 2025-05-07T19:44:46.3848441Z #define __DBL_EPSILON__ ((double)2.22044604925031308084726333618164062e-16L) 2025-05-07T19:44:46.3848919Z #define __FLT128_MIN__ 3.36210314311209350626267781732175260e-4932F128 2025-05-07T19:44:46.3849273Z #define _LP64 1 2025-05-07T19:44:46.3849498Z #define __UINT8_C(c) c 2025-05-07T19:44:46.3849731Z #define __FLT64_MAX_EXP__ 1024 2025-05-07T19:44:46.3850004Z #define __INT_LEAST32_TYPE__ int 2025-05-07T19:44:46.3850266Z #define __SIZEOF_WCHAR_T__ 4 2025-05-07T19:44:46.3850549Z #define __UINT64_TYPE__ long unsigned int 2025-05-07T19:44:46.3850842Z #define __GNUC_PATCHLEVEL__ 0 2025-05-07T19:44:46.3851207Z #define __FLT128_NORM_MAX__ 1.18973149535723176508575932662800702e+4932F128 2025-05-07T19:44:46.3851679Z #define __FLT64_NORM_MAX__ 1.79769313486231570814527423731704357e+308F64 2025-05-07T19:44:46.3852045Z #define __FLT128_HAS_QUIET_NAN__ 1 2025-05-07T19:44:46.3852347Z #define __INTMAX_MAX__ 0x7fffffffffffffffL 2025-05-07T19:44:46.3852651Z #define __INT_FAST8_TYPE__ signed char 2025-05-07T19:44:46.3853027Z #define __FLT64X_MIN__ 3.36210314311209350626267781732175260e-4932F64x 2025-05-07T19:44:46.3853396Z #define __GNUC_STDC_INLINE__ 1 2025-05-07T19:44:46.3853666Z #define __FLT64_HAS_DENORM__ 1 2025-05-07T19:44:46.3853991Z #define __FLT32_EPSILON__ 1.19209289550781250000000000000000000e-7F32 2025-05-07T19:44:46.3854366Z #define __DBL_DECIMAL_DIG__ 17 2025-05-07T19:44:46.3854631Z #define __STDC_UTF_32__ 1 2025-05-07T19:44:46.3854870Z #define __INT_FAST8_WIDTH__ 8 2025-05-07T19:44:46.3855126Z #define __FXSR__ 1 2025-05-07T19:44:46.3855416Z #define __FLT32X_MAX__ 1.79769313486231570814527423731704357e+308F32x 2025-05-07T19:44:46.3855879Z #define __DBL_NORM_MAX__ ((double)1.79769313486231570814527423731704357e+308L) 2025-05-07T19:44:46.3856277Z #define __BYTE_ORDER__ __ORDER_LITTLE_ENDIAN__ 2025-05-07T19:44:46.3856591Z #define __INTMAX_WIDTH__ 64 2025-05-07T19:44:46.3856836Z #define __UINT32_C(c) c ## U 2025-05-07T19:44:46.3857176Z #define __FLT_DENORM_MIN__ 1.40129846432481707092372958328991613e-45F 2025-05-07T19:44:46.3857635Z #define __INT8_MAX__ 0x7f 2025-05-07T19:44:46.3857869Z #define __LONG_WIDTH__ 64 2025-05-07T19:44:46.3858115Z #define __PIC__ 2 2025-05-07T19:44:46.3858359Z #define __UINT_FAST32_TYPE__ long unsigned int 2025-05-07T19:44:46.3858767Z #define __FLT32X_NORM_MAX__ 1.79769313486231570814527423731704357e+308F32x 2025-05-07T19:44:46.3859150Z #define __CHAR32_TYPE__ unsigned int 2025-05-07T19:44:46.3859500Z #define __FLT_MAX__ 3.40282346638528859811704183484516925e+38F 2025-05-07T19:44:46.3859825Z #define __SSE2__ 1 2025-05-07T19:44:46.3860066Z #define __INT32_TYPE__ int 2025-05-07T19:44:46.3860321Z #define __SIZEOF_DOUBLE__ 8 2025-05-07T19:44:46.3860568Z #define __FLT_MIN_10_EXP__ (-37) 2025-05-07T19:44:46.3860906Z #define __FLT64_MIN__ 2.22507385850720138309023271733240406e-308F64 2025-05-07T19:44:46.3861260Z #define __INT_LEAST32_WIDTH__ 32 2025-05-07T19:44:46.3861538Z #define __INTMAX_TYPE__ long int 2025-05-07T19:44:46.3861799Z #define __DEC128_MAX_EXP__ 6145 2025-05-07T19:44:46.3862153Z #define __FLT32X_HAS_QUIET_NAN__ 1 2025-05-07T19:44:46.3862428Z #define __ATOMIC_CONSUME 1 2025-05-07T19:44:46.3862843Z #define __GNUC_MINOR__ 4 2025-05-07T19:44:46.3863082Z #define __INT_FAST16_WIDTH__ 64 2025-05-07T19:44:46.3863475Z #define __UINTMAX_MAX__ 0xffffffffffffffffUL 2025-05-07T19:44:46.3863962Z #define __PIE__ 2 2025-05-07T19:44:46.3864300Z #define __FLT32X_DENORM_MIN__ 4.94065645841246544176568792868221372e-324F32x 2025-05-07T19:44:46.3864817Z #define __DBL_MAX_10_EXP__ 308 2025-05-07T19:44:46.3865180Z #define __LDBL_DENORM_MIN__ 3.64519953188247460252840593361941982e-4951L 2025-05-07T19:44:46.3865583Z #define __INT16_C(c) c 2025-05-07T19:44:46.3865823Z #define __STDC__ 1 2025-05-07T19:44:46.3866081Z #define __PTRDIFF_TYPE__ long int 2025-05-07T19:44:46.3866371Z #define __ATOMIC_SEQ_CST 5 2025-05-07T19:44:46.3866654Z #define __FLT32X_MIN_10_EXP__ (-307) 2025-05-07T19:44:46.3866979Z #define __UINTPTR_TYPE__ long unsigned int 2025-05-07T19:44:46.3867348Z #define __DEC64_SUBNORMAL_MIN__ 0.000000000000001E-383DD 2025-05-07T19:44:46.3867719Z #define __DEC128_MANT_DIG__ 34 2025-05-07T19:44:46.3867995Z #define __LDBL_MIN_10_EXP__ (-4931) 2025-05-07T19:44:46.3868304Z #define __SIZEOF_LONG_LONG__ 8 2025-05-07T19:44:46.3868580Z #define __FLT128_DECIMAL_DIG__ 36 2025-05-07T19:44:46.3868893Z #define __GCC_ATOMIC_LLONG_LOCK_FREE 2 2025-05-07T19:44:46.3869196Z #define __FLT32_HAS_QUIET_NAN__ 1 2025-05-07T19:44:46.3869501Z #define __FLT_DECIMAL_DIG__ 9 2025-05-07T19:44:46.3869808Z #define __UINT_FAST16_MAX__ 0xffffffffffffffffUL 2025-05-07T19:44:46.3870228Z #define __LDBL_NORM_MAX__ 1.18973149535723176502126385303097021e+4932L 2025-05-07T19:44:46.3870643Z #define __GCC_ATOMIC_SHORT_LOCK_FREE 2 2025-05-07T19:44:46.3870952Z #define __UINT_FAST8_TYPE__ unsigned char 2025-05-07T19:44:46.3871278Z #define __ATOMIC_ACQ_REL 4 2025-05-07T19:44:46.3871538Z #define __ATOMIC_RELEASE 3 2025-05-07T19:44:46.3871715Z 2025-05-07T19:44:46.4305631Z 2025-05-07T19:44:46.4306627Z [INFO] Printing out all preprocessor defines in the C++ compiler ... 2025-05-07T19:44:46.4308010Z + conda run -n build_binary c++ -dM -E -x c++ - 2025-05-07T19:44:46.4309121Z 2025-05-07T19:44:48.2463993Z #define __DBL_MIN_EXP__ (-1021) 2025-05-07T19:44:48.2464559Z #define __cpp_attributes 200809L 2025-05-07T19:44:48.2464942Z #define __cpp_nontype_template_parameter_auto 201606L 2025-05-07T19:44:48.2465382Z #define __UINT_LEAST16_MAX__ 0xffff 2025-05-07T19:44:48.2465692Z #define __ATOMIC_ACQUIRE 2 2025-05-07T19:44:48.2466008Z #define __FLT128_MAX_10_EXP__ 4932 2025-05-07T19:44:48.2466365Z #define __FLT_MIN__ 1.17549435082228750796873653722224568e-38F 2025-05-07T19:44:48.2466774Z #define __GCC_IEC_559_COMPLEX 2 2025-05-07T19:44:48.2467076Z #define __cpp_aggregate_nsdmi 201304L 2025-05-07T19:44:48.2467437Z #define __UINT_LEAST8_TYPE__ unsigned char 2025-05-07T19:44:48.2467797Z #define __SIZEOF_FLOAT80__ 16 2025-05-07T19:44:48.2468083Z #define __INTMAX_C(c) c ## L 2025-05-07T19:44:48.2468384Z #define __CHAR_BIT__ 8 2025-05-07T19:44:48.2468699Z #define __UINT8_MAX__ 0xff 2025-05-07T19:44:48.2469313Z #define __SCHAR_WIDTH__ 8 2025-05-07T19:44:48.2469732Z #define __WINT_MAX__ 0xffffffffU 2025-05-07T19:44:48.2470050Z #define __FLT32_MIN_EXP__ (-125) 2025-05-07T19:44:48.2470345Z #define __cpp_static_assert 201411L 2025-05-07T19:44:48.2470684Z #define __ORDER_LITTLE_ENDIAN__ 1234 2025-05-07T19:44:48.2471033Z #define __SIZE_MAX__ 0xffffffffffffffffUL 2025-05-07T19:44:48.2471361Z #define __WCHAR_MAX__ 0x7fffffff 2025-05-07T19:44:48.2471697Z #define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_1 1 2025-05-07T19:44:48.2472045Z #define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_2 1 2025-05-07T19:44:48.2472421Z #define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_4 1 2025-05-07T19:44:48.2472859Z #define __DBL_DENORM_MIN__ double(4.94065645841246544176568792868221372e-324L) 2025-05-07T19:44:48.2473332Z #define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_8 1 2025-05-07T19:44:48.2473673Z #define __GCC_ATOMIC_CHAR_LOCK_FREE 2 2025-05-07T19:44:48.2474010Z #define __GCC_IEC_559 2 2025-05-07T19:44:48.2474452Z #define __FLT32X_DECIMAL_DIG__ 17 2025-05-07T19:44:48.2474777Z #define __FLT_EVAL_METHOD__ 0 2025-05-07T19:44:48.2475110Z #define __cpp_binary_literals 201304L 2025-05-07T19:44:48.2475426Z #define __FLT64_DECIMAL_DIG__ 17 2025-05-07T19:44:48.2475771Z #define __cpp_noexcept_function_type 201510L 2025-05-07T19:44:48.2476120Z #define __GCC_ATOMIC_CHAR32_T_LOCK_FREE 2 2025-05-07T19:44:48.2476484Z #define __cpp_variadic_templates 200704L 2025-05-07T19:44:48.2476844Z #define __UINT_FAST64_MAX__ 0xffffffffffffffffUL 2025-05-07T19:44:48.2477222Z #define __SIG_ATOMIC_TYPE__ int 2025-05-07T19:44:48.2477515Z #define __DBL_MIN_10_EXP__ (-307) 2025-05-07T19:44:48.2477845Z #define __FINITE_MATH_ONLY__ 0 2025-05-07T19:44:48.2478174Z #define __cpp_variable_templates 201304L 2025-05-07T19:44:48.2478500Z #define __FLT32X_MAX_EXP__ 1024 2025-05-07T19:44:48.2478812Z #define __FLT32_HAS_DENORM__ 1 2025-05-07T19:44:48.2479100Z #define __UINT_FAST8_MAX__ 0xff 2025-05-07T19:44:48.2479422Z #define __cpp_rvalue_reference 200610L 2025-05-07T19:44:48.2479783Z #define __cpp_nested_namespace_definitions 201411L 2025-05-07T19:44:48.2480170Z #define __DEC64_MAX_EXP__ 385 2025-05-07T19:44:48.2480444Z #define __INT8_C(c) c 2025-05-07T19:44:48.2480729Z #define __INT_LEAST8_WIDTH__ 8 2025-05-07T19:44:48.2481019Z #define __cpp_variadic_using 201611L 2025-05-07T19:44:48.2481384Z #define __UINT_LEAST64_MAX__ 0xffffffffffffffffUL 2025-05-07T19:44:48.2481751Z #define __INT_LEAST8_MAX__ 0x7f 2025-05-07T19:44:48.2482039Z #define __cpp_capture_star_this 201603L 2025-05-07T19:44:48.2482346Z #define __SHRT_MAX__ 0x7fff 2025-05-07T19:44:48.2482663Z #define __LDBL_MAX__ 1.18973149535723176502126385303097021e+4932L 2025-05-07T19:44:48.2483044Z #define __FLT64X_MAX_10_EXP__ 4932 2025-05-07T19:44:48.2483329Z #define __cpp_if_constexpr 201606L 2025-05-07T19:44:48.2483621Z #define __LDBL_IS_IEC_60559__ 2 2025-05-07T19:44:48.2483900Z #define __FLT64X_HAS_QUIET_NAN__ 1 2025-05-07T19:44:48.2484290Z #define __UINT_LEAST8_MAX__ 0xff 2025-05-07T19:44:48.2484585Z #define __GCC_ATOMIC_BOOL_LOCK_FREE 2 2025-05-07T19:44:48.2484989Z #define __FLT128_DENORM_MIN__ 6.47517511943802511092443895822764655e-4966F128 2025-05-07T19:44:48.2485425Z #define __UINTMAX_TYPE__ long unsigned int 2025-05-07T19:44:48.2485717Z #define __linux 1 2025-05-07T19:44:48.2485961Z #define __DEC32_EPSILON__ 1E-6DF 2025-05-07T19:44:48.2486243Z #define __FLT_EVAL_METHOD_TS_18661_3__ 0 2025-05-07T19:44:48.2486537Z #define __unix 1 2025-05-07T19:44:48.2486768Z #define __UINT32_MAX__ 0xffffffffU 2025-05-07T19:44:48.2487044Z #define __GXX_EXPERIMENTAL_CXX0X__ 1 2025-05-07T19:44:48.2487339Z #define __FLT128_MIN_EXP__ (-16381) 2025-05-07T19:44:48.2487603Z #define __WINT_MIN__ 0U 2025-05-07T19:44:48.2487858Z #define __FLT128_MIN_10_EXP__ (-4931) 2025-05-07T19:44:48.2488190Z #define __FLT32X_IS_IEC_60559__ 2 2025-05-07T19:44:48.2488481Z #define __INT_LEAST16_WIDTH__ 16 2025-05-07T19:44:48.2488789Z #define __SCHAR_MAX__ 0x7f 2025-05-07T19:44:48.2489060Z #define __FLT128_MANT_DIG__ 113 2025-05-07T19:44:48.2489400Z #define __WCHAR_MIN__ (-__WCHAR_MAX__ - 1) 2025-05-07T19:44:48.2489811Z #define __INT64_C(c) c ## L 2025-05-07T19:44:48.2490128Z #define __GCC_ATOMIC_POINTER_LOCK_FREE 2 2025-05-07T19:44:48.2490479Z #define __FLT32X_MANT_DIG__ 53 2025-05-07T19:44:48.2490779Z #define __GCC_ATOMIC_CHAR16_T_LOCK_FREE 2 2025-05-07T19:44:48.2491132Z #define __cpp_aligned_new 201606L 2025-05-07T19:44:48.2491434Z #define __USER_LABEL_PREFIX__ 2025-05-07T19:44:48.2491746Z #define __FLT32_MAX_10_EXP__ 38 2025-05-07T19:44:48.2492120Z #define __FLT64X_EPSILON__ 1.08420217248550443400745280086994171e-19F64x 2025-05-07T19:44:48.2492554Z #define __STDC_HOSTED__ 1 2025-05-07T19:44:48.2492828Z #define __DEC64_MIN_EXP__ (-382) 2025-05-07T19:44:48.2493157Z #define __cpp_decltype_auto 201304L 2025-05-07T19:44:48.2493456Z #define __DBL_DIG__ 15 2025-05-07T19:44:48.2493735Z #define __FLT32_DIG__ 6 2025-05-07T19:44:48.2494086Z #define __FLT_EPSILON__ 1.19209289550781250000000000000000000e-7F 2025-05-07T19:44:48.2494464Z #define __GXX_WEAK__ 1 2025-05-07T19:44:48.2494838Z #define __SHRT_WIDTH__ 16 2025-05-07T19:44:48.2495115Z #define __FLT32_IS_IEC_60559__ 2 2025-05-07T19:44:48.2495485Z #define __LDBL_MIN__ 3.36210314311209350626267781732175260e-4932L 2025-05-07T19:44:48.2495864Z #define __DBL_IS_IEC_60559__ 2 2025-05-07T19:44:48.2496168Z #define __DEC32_MAX__ 9.999999E96DF 2025-05-07T19:44:48.2496485Z #define __cpp_threadsafe_static_init 200806L 2025-05-07T19:44:48.2496849Z #define __cpp_enumerator_attributes 201411L 2025-05-07T19:44:48.2497294Z #define __FLT64X_DENORM_MIN__ 3.64519953188247460252840593361941982e-4951F64x 2025-05-07T19:44:48.2497713Z #define __FLT32X_HAS_INFINITY__ 1 2025-05-07T19:44:48.2498013Z #define __INT32_MAX__ 0x7fffffff 2025-05-07T19:44:48.2498282Z #define __unix__ 1 2025-05-07T19:44:48.2498531Z #define __INT_WIDTH__ 32 2025-05-07T19:44:48.2498777Z #define __SIZEOF_LONG__ 8 2025-05-07T19:44:48.2499054Z #define __STDC_IEC_559__ 1 2025-05-07T19:44:48.2499317Z #define __STDC_ISO_10646__ 201103L 2025-05-07T19:44:48.2499609Z #define __UINT16_C(c) c 2025-05-07T19:44:48.2499853Z #define __DECIMAL_DIG__ 21 2025-05-07T19:44:48.2500134Z #define __STDC_IEC_559_COMPLEX__ 1 2025-05-07T19:44:48.2500524Z #define __FLT64_EPSILON__ 2.22044604925031308084726333618164062e-16F64 2025-05-07T19:44:48.2500904Z #define __gnu_linux__ 1 2025-05-07T19:44:48.2501167Z #define __INT16_MAX__ 0x7fff 2025-05-07T19:44:48.2501435Z #define __FLT64_MIN_EXP__ (-1021) 2025-05-07T19:44:48.2501743Z #define __FLT64X_MIN_10_EXP__ (-4931) 2025-05-07T19:44:48.2502039Z #define __LDBL_HAS_QUIET_NAN__ 1 2025-05-07T19:44:48.2502340Z #define __FLT64_MANT_DIG__ 53 2025-05-07T19:44:48.2502610Z #define __FLT64X_MANT_DIG__ 64 2025-05-07T19:44:48.2502895Z #define __GNUC__ 11 2025-05-07T19:44:48.2503123Z #define __GXX_RTTI 1 2025-05-07T19:44:48.2503469Z #define __pie__ 2 2025-05-07T19:44:48.2503888Z #define __MMX__ 1 2025-05-07T19:44:48.2504134Z #define __FLT_HAS_DENORM__ 1 2025-05-07T19:44:48.2504449Z #define __SIZEOF_LONG_DOUBLE__ 16 2025-05-07T19:44:48.2504756Z #define __BIGGEST_ALIGNMENT__ 16 2025-05-07T19:44:48.2505071Z #define __STDC_UTF_16__ 1 2025-05-07T19:44:48.2505347Z #define __FLT64_MAX_10_EXP__ 308 2025-05-07T19:44:48.2505688Z #define __cpp_delegating_constructors 200604L 2025-05-07T19:44:48.2506028Z #define __FLT32_HAS_INFINITY__ 1 2025-05-07T19:44:48.2506419Z #define __DBL_MAX__ double(1.79769313486231570814527423731704357e+308L) 2025-05-07T19:44:48.2506814Z #define __cpp_raw_strings 200710L 2025-05-07T19:44:48.2507158Z #define __INT_FAST32_MAX__ 0x7fffffffffffffffL 2025-05-07T19:44:48.2507516Z #define __DBL_HAS_INFINITY__ 1 2025-05-07T19:44:48.2507798Z #define __SIZEOF_FLOAT__ 4 2025-05-07T19:44:48.2508098Z #define __HAVE_SPECULATION_SAFE_VALUE 1 2025-05-07T19:44:48.2508419Z #define __cpp_fold_expressions 201603L 2025-05-07T19:44:48.2508963Z #define __DEC32_MIN_EXP__ (-94) 2025-05-07T19:44:48.2509250Z #define __INTPTR_WIDTH__ 64 2025-05-07T19:44:48.2509553Z #define __FLT64X_HAS_INFINITY__ 1 2025-05-07T19:44:48.2509851Z #define __UINT_LEAST32_MAX__ 0xffffffffU 2025-05-07T19:44:48.2510198Z #define __FLT32X_HAS_DENORM__ 1 2025-05-07T19:44:48.2510642Z #define __INT_FAST16_TYPE__ long int 2025-05-07T19:44:48.2510936Z #define __MMX_WITH_SSE__ 1 2025-05-07T19:44:48.2511233Z #define __LDBL_HAS_DENORM__ 1 2025-05-07T19:44:48.2511512Z #define __cplusplus 201703L 2025-05-07T19:44:48.2511824Z #define __cpp_ref_qualifiers 200710L 2025-05-07T19:44:48.2512122Z #define __DEC32_MIN__ 1E-95DF 2025-05-07T19:44:48.2512417Z #define __DEPRECATED 1 2025-05-07T19:44:48.2512680Z #define __cpp_rvalue_references 200610L 2025-05-07T19:44:48.2513025Z #define __DBL_MAX_EXP__ 1024 2025-05-07T19:44:48.2513294Z #define __WCHAR_WIDTH__ 32 2025-05-07T19:44:48.2513651Z #define __FLT32_MAX__ 3.40282346638528859811704183484516925e+38F32 2025-05-07T19:44:48.2514057Z #define __DEC128_EPSILON__ 1E-33DL 2025-05-07T19:44:48.2514344Z #define __SSE2_MATH__ 1 2025-05-07T19:44:48.2514628Z #define __ATOMIC_HLE_RELEASE 131072 2025-05-07T19:44:48.2514944Z #define __PTRDIFF_MAX__ 0x7fffffffffffffffL 2025-05-07T19:44:48.2515369Z #define __amd64 1 2025-05-07T19:44:48.2515617Z #define __STDC_NO_THREADS__ 1 2025-05-07T19:44:48.2515925Z #define __ATOMIC_HLE_ACQUIRE 65536 2025-05-07T19:44:48.2516208Z #define __GNUG__ 11 2025-05-07T19:44:48.2516502Z #define __LONG_LONG_MAX__ 0x7fffffffffffffffLL 2025-05-07T19:44:48.2516835Z #define __SIZEOF_SIZE_T__ 8 2025-05-07T19:44:48.2517137Z #define __cpp_nsdmi 200809L 2025-05-07T19:44:48.2517434Z #define __FLT64X_MIN_EXP__ (-16381) 2025-05-07T19:44:48.2517728Z #define __SIZEOF_WINT_T__ 4 2025-05-07T19:44:48.2518017Z #define __LONG_LONG_WIDTH__ 64 2025-05-07T19:44:48.2518312Z #define __cpp_initializer_lists 200806L 2025-05-07T19:44:48.2518645Z #define __FLT32_MAX_EXP__ 128 2025-05-07T19:44:48.2518920Z #define __cpp_hex_float 201603L 2025-05-07T19:44:48.2519225Z #define __GXX_ABI_VERSION 1016 2025-05-07T19:44:48.2519506Z #define __FLT128_HAS_INFINITY__ 1 2025-05-07T19:44:48.2519824Z #define __FLT_MIN_EXP__ (-125) 2025-05-07T19:44:48.2520111Z #define __GCC_HAVE_DWARF2_CFI_ASM 1 2025-05-07T19:44:48.2520431Z #define __x86_64 1 2025-05-07T19:44:48.2520706Z #define __cpp_lambdas 200907L 2025-05-07T19:44:48.2521211Z #define __INT_FAST64_TYPE__ long int 2025-05-07T19:44:48.2521604Z #define __FLT64_DENORM_MIN__ 4.94065645841246544176568792868221372e-324F64 2025-05-07T19:44:48.2521996Z #define __cpp_template_auto 201606L 2025-05-07T19:44:48.2522380Z #define __DBL_MIN__ double(2.22507385850720138309023271733240406e-308L) 2025-05-07T19:44:48.2522833Z #define __FLT128_EPSILON__ 1.92592994438723585305597794258492732e-34F128 2025-05-07T19:44:48.2523337Z #define __FLT64X_NORM_MAX__ 1.18973149535723176502126385303097021e+4932F64x 2025-05-07T19:44:48.2523727Z #define __SIZEOF_POINTER__ 8 2025-05-07T19:44:48.2524017Z #define __LP64__ 1 2025-05-07T19:44:48.2524281Z #define __DBL_HAS_QUIET_NAN__ 1 2025-05-07T19:44:48.2524635Z #define __FLT32X_EPSILON__ 2.22044604925031308084726333618164062e-16F32x 2025-05-07T19:44:48.2525038Z #define __DECIMAL_BID_FORMAT__ 1 2025-05-07T19:44:48.2525313Z #define __FLT64_MIN_10_EXP__ (-307) 2025-05-07T19:44:48.2525627Z #define __FLT64X_DECIMAL_DIG__ 21 2025-05-07T19:44:48.2525914Z #define __DEC128_MIN__ 1E-6143DL 2025-05-07T19:44:48.2526213Z #define __REGISTER_PREFIX__ 2025-05-07T19:44:48.2526479Z #define __UINT16_MAX__ 0xffff 2025-05-07T19:44:48.2526772Z #define __LDBL_HAS_INFINITY__ 1 2025-05-07T19:44:48.2527105Z #define __FLT32_MIN__ 1.17549435082228750796873653722224568e-38F32 2025-05-07T19:44:48.2527500Z #define __UINT8_TYPE__ unsigned char 2025-05-07T19:44:48.2527810Z #define __FLT_DIG__ 6 2025-05-07T19:44:48.2528048Z #define __NO_INLINE__ 1 2025-05-07T19:44:48.2528464Z #define __DEC_EVAL_METHOD__ 2 2025-05-07T19:44:48.2528784Z #define __DEC128_MAX__ 9.999999999999999999999999999999999E6144DL 2025-05-07T19:44:48.2529159Z #define __FLT_MANT_DIG__ 24 2025-05-07T19:44:48.2529416Z #define __LDBL_DECIMAL_DIG__ 21 2025-05-07T19:44:48.2529699Z #define __VERSION__ "11.4.0" 2025-05-07T19:44:48.2529953Z #define __UINT64_C(c) c ## UL 2025-05-07T19:44:48.2530251Z #define __cpp_unicode_characters 201411L 2025-05-07T19:44:48.2530571Z #define _STDC_PREDEF_H 1 2025-05-07T19:44:48.2530902Z #define __INT_LEAST32_MAX__ 0x7fffffff 2025-05-07T19:44:48.2531220Z #define __GCC_ATOMIC_INT_LOCK_FREE 2 2025-05-07T19:44:48.2531508Z #define __FLT128_MAX_EXP__ 16384 2025-05-07T19:44:48.2531806Z #define __FLT32_MANT_DIG__ 24 2025-05-07T19:44:48.2532108Z #define __FLOAT_WORD_ORDER__ __ORDER_LITTLE_ENDIAN__ 2025-05-07T19:44:48.2532471Z #define __cpp_aggregate_bases 201603L 2025-05-07T19:44:48.2532772Z #define __FLT128_HAS_DENORM__ 1 2025-05-07T19:44:48.2533068Z #define __FLT32_DECIMAL_DIG__ 9 2025-05-07T19:44:48.2533332Z #define __FLT128_DIG__ 33 2025-05-07T19:44:48.2533603Z #define __INT32_C(c) c 2025-05-07T19:44:48.2533874Z #define __DEC64_EPSILON__ 1E-15DD 2025-05-07T19:44:48.2534155Z #define __ORDER_PDP_ENDIAN__ 3412 2025-05-07T19:44:48.2534461Z #define __DEC128_MIN_EXP__ (-6142) 2025-05-07T19:44:48.2534741Z #define __INT_FAST32_TYPE__ long int 2025-05-07T19:44:48.2535079Z #define __UINT_LEAST16_TYPE__ short unsigned int 2025-05-07T19:44:48.2535504Z #define unix 1 2025-05-07T19:44:48.2535758Z #define __DBL_HAS_DENORM__ 1 2025-05-07T19:44:48.2536031Z #define __cpp_rtti 199711L 2025-05-07T19:44:48.2536331Z #define __SIZE_TYPE__ long unsigned int 2025-05-07T19:44:48.2536650Z #define __UINT64_MAX__ 0xffffffffffffffffUL 2025-05-07T19:44:48.2536992Z #define __FLT_IS_IEC_60559__ 2 2025-05-07T19:44:48.2537340Z #define __GNUC_WIDE_EXECUTION_CHARSET_NAME "UTF-32LE" 2025-05-07T19:44:48.2537677Z #define __FLT64X_DIG__ 18 2025-05-07T19:44:48.2537962Z #define __INT8_TYPE__ signed char 2025-05-07T19:44:48.2538249Z #define __cpp_digit_separators 201309L 2025-05-07T19:44:48.2538554Z #define __ELF__ 1 2025-05-07T19:44:48.2538787Z #define __GCC_ASM_FLAG_OUTPUTS__ 1 2025-05-07T19:44:48.2539094Z #define __UINT32_TYPE__ unsigned int 2025-05-07T19:44:48.2539374Z #define __FLT_RADIX__ 2 2025-05-07T19:44:48.2539645Z #define __INT_LEAST16_TYPE__ short int 2025-05-07T19:44:48.2539997Z #define __LDBL_EPSILON__ 1.08420217248550443400745280086994171e-19L 2025-05-07T19:44:48.2540388Z #define __UINTMAX_C(c) c ## UL 2025-05-07T19:44:48.2540686Z #define __GLIBCXX_BITSIZE_INT_N_0 128 2025-05-07T19:44:48.2540965Z #define __k8 1 2025-05-07T19:44:48.2541280Z #define __FLT32X_MIN__ 2.22507385850720138309023271733240406e-308F32x 2025-05-07T19:44:48.2541656Z #define __SIG_ATOMIC_MAX__ 0x7fffffff 2025-05-07T19:44:48.2541977Z #define __GCC_ATOMIC_WCHAR_T_LOCK_FREE 2 2025-05-07T19:44:48.2542279Z #define __SIZEOF_PTRDIFF_T__ 8 2025-05-07T19:44:48.2542568Z #define __LDBL_DIG__ 18 2025-05-07T19:44:48.2542809Z #define __FLT64_IS_IEC_60559__ 2 2025-05-07T19:44:48.2543093Z #define __x86_64__ 1 2025-05-07T19:44:48.2543426Z #define __FLT32X_MIN_EXP__ (-1021) 2025-05-07T19:44:48.2543730Z #define __DEC32_SUBNORMAL_MIN__ 0.000001E-95DF 2025-05-07T19:44:48.2544271Z #define __INT_FAST16_MAX__ 0x7fffffffffffffffL 2025-05-07T19:44:48.2544641Z #define __FLT64_DIG__ 15 2025-05-07T19:44:48.2544965Z #define __UINT_FAST32_MAX__ 0xffffffffffffffffUL 2025-05-07T19:44:48.2545345Z #define __UINT_LEAST64_TYPE__ long unsigned int 2025-05-07T19:44:48.2545723Z #define __FLT_HAS_QUIET_NAN__ 1 2025-05-07T19:44:48.2546008Z #define __FLT_MAX_10_EXP__ 38 2025-05-07T19:44:48.2546321Z #define __LONG_MAX__ 0x7fffffffffffffffL 2025-05-07T19:44:48.2546637Z #define __FLT64X_HAS_DENORM__ 1 2025-05-07T19:44:48.2547050Z #define __DEC128_SUBNORMAL_MIN__ 0.000000000000000000000000000000001E-6143DL 2025-05-07T19:44:48.2547499Z #define __FLT_HAS_INFINITY__ 1 2025-05-07T19:44:48.2547813Z #define __GNUC_EXECUTION_CHARSET_NAME "UTF-8" 2025-05-07T19:44:48.2548190Z #define __cpp_unicode_literals 200710L 2025-05-07T19:44:48.2548529Z #define __UINT_FAST16_TYPE__ long unsigned int 2025-05-07T19:44:48.2548901Z #define __DEC64_MAX__ 9.999999999999999E384DD 2025-05-07T19:44:48.2549228Z #define __INT_FAST32_WIDTH__ 64 2025-05-07T19:44:48.2549557Z #define __CHAR16_TYPE__ short unsigned int 2025-05-07T19:44:48.2549887Z #define __PRAGMA_REDEFINE_EXTNAME 1 2025-05-07T19:44:48.2550207Z #define __SIZE_WIDTH__ 64 2025-05-07T19:44:48.2550497Z #define __SEG_FS 1 2025-05-07T19:44:48.2550829Z #define __INT_LEAST16_MAX__ 0x7fff 2025-05-07T19:44:48.2551151Z #define __DEC64_MANT_DIG__ 16 2025-05-07T19:44:48.2551451Z #define __INT64_MAX__ 0x7fffffffffffffffL 2025-05-07T19:44:48.2551789Z #define __SEG_GS 1 2025-05-07T19:44:48.2552132Z #define __FLT32_DENORM_MIN__ 1.40129846432481707092372958328991613e-45F32 2025-05-07T19:44:48.2552579Z #define __SIG_ATOMIC_WIDTH__ 32 2025-05-07T19:44:48.2552891Z #define __INT_LEAST64_TYPE__ long int 2025-05-07T19:44:48.2553230Z #define __INT16_TYPE__ short int 2025-05-07T19:44:48.2553564Z #define __INT_LEAST8_TYPE__ signed char 2025-05-07T19:44:48.2553900Z #define __cpp_structured_bindings 201606L 2025-05-07T19:44:48.2554248Z #define __SIZEOF_INT__ 4 2025-05-07T19:44:48.2554520Z #define __DEC32_MAX_EXP__ 97 2025-05-07T19:44:48.2554833Z #define __INT_FAST8_MAX__ 0x7f 2025-05-07T19:44:48.2555202Z #define __FLT128_MAX__ 1.18973149535723176508575932662800702e+4932F128 2025-05-07T19:44:48.2555711Z #define __INTPTR_MAX__ 0x7fffffffffffffffL 2025-05-07T19:44:48.2556055Z #define __cpp_sized_deallocation 201309L 2025-05-07T19:44:48.2556526Z #define __cpp_guaranteed_copy_elision 201606L 2025-05-07T19:44:48.2556835Z #define linux 1 2025-05-07T19:44:48.2557102Z #define __FLT64_HAS_QUIET_NAN__ 1 2025-05-07T19:44:48.2557417Z #define __FLT32_MIN_10_EXP__ (-37) 2025-05-07T19:44:48.2557700Z #define __EXCEPTIONS 1 2025-05-07T19:44:48.2557981Z #define __PTRDIFF_WIDTH__ 64 2025-05-07T19:44:48.2558252Z #define __LDBL_MANT_DIG__ 64 2025-05-07T19:44:48.2558567Z #define __cpp_range_based_for 201603L 2025-05-07T19:44:48.2558872Z #define __FLT64_HAS_INFINITY__ 1 2025-05-07T19:44:48.2559263Z #define __FLT64X_MAX__ 1.18973149535723176502126385303097021e+4932F64x 2025-05-07T19:44:48.2559662Z #define __STDCPP_DEFAULT_NEW_ALIGNMENT__ 16 2025-05-07T19:44:48.2560051Z #define __SIG_ATOMIC_MIN__ (-__SIG_ATOMIC_MAX__ - 1) 2025-05-07T19:44:48.2560423Z #define __code_model_small__ 1 2025-05-07T19:44:48.2560708Z #define __GCC_ATOMIC_LONG_LOCK_FREE 2 2025-05-07T19:44:48.2561050Z #define __cpp_nontype_template_args 201411L 2025-05-07T19:44:48.2561358Z #define __DEC32_MANT_DIG__ 7 2025-05-07T19:44:48.2561670Z #define __cpp_return_type_deduction 201304L 2025-05-07T19:44:48.2561962Z #define __k8__ 1 2025-05-07T19:44:48.2562212Z #define __INTPTR_TYPE__ long int 2025-05-07T19:44:48.2562494Z #define __UINT16_TYPE__ short unsigned int 2025-05-07T19:44:48.2562812Z #define __WCHAR_TYPE__ int 2025-05-07T19:44:48.2563059Z #define __pic__ 2 2025-05-07T19:44:48.2563339Z #define __UINTPTR_MAX__ 0xffffffffffffffffUL 2025-05-07T19:44:48.2563673Z #define __INT_FAST64_WIDTH__ 64 2025-05-07T19:44:48.2563942Z #define __cpp_decltype 200707L 2025-05-07T19:44:48.2564261Z #define __INT_FAST64_MAX__ 0x7fffffffffffffffL 2025-05-07T19:44:48.2564592Z #define __GCC_ATOMIC_TEST_AND_SET_TRUEVAL 1 2025-05-07T19:44:48.2564983Z #define __FLT_NORM_MAX__ 3.40282346638528859811704183484516925e+38F 2025-05-07T19:44:48.2565351Z #define __FLT64X_MAX_EXP__ 16384 2025-05-07T19:44:48.2565681Z #define __UINT_FAST64_TYPE__ long unsigned int 2025-05-07T19:44:48.2566014Z #define __cpp_inline_variables 201606L 2025-05-07T19:44:48.2566340Z #define __INT_MAX__ 0x7fffffff 2025-05-07T19:44:48.2566597Z #define __linux__ 1 2025-05-07T19:44:48.2566851Z #define __INT64_TYPE__ long int 2025-05-07T19:44:48.2567146Z #define __FLT_MAX_EXP__ 128 2025-05-07T19:44:48.2567408Z #define __ORDER_BIG_ENDIAN__ 4321 2025-05-07T19:44:48.2567711Z #define __DBL_MANT_DIG__ 53 2025-05-07T19:44:48.2567999Z #define __cpp_inheriting_constructors 201511L 2025-05-07T19:44:48.2568345Z #define __SIZEOF_FLOAT128__ 16 2025-05-07T19:44:48.2568643Z #define __INT_LEAST64_MAX__ 0x7fffffffffffffffL 2025-05-07T19:44:48.2568981Z #define __DEC64_MIN__ 1E-383DD 2025-05-07T19:44:48.2569251Z #define __WINT_TYPE__ unsigned int 2025-05-07T19:44:48.2569573Z #define __UINT_LEAST32_TYPE__ unsigned int 2025-05-07T19:44:48.2569900Z #define __SIZEOF_SHORT__ 2 2025-05-07T19:44:48.2570231Z #define __FLT32_NORM_MAX__ 3.40282346638528859811704183484516925e+38F32 2025-05-07T19:44:48.2570623Z #define __SSE__ 1 2025-05-07T19:44:48.2570942Z #define __LDBL_MIN_EXP__ (-16381) 2025-05-07T19:44:48.2571317Z #define __FLT64_MAX__ 1.79769313486231570814527423731704357e+308F64 2025-05-07T19:44:48.2571676Z #define __amd64__ 1 2025-05-07T19:44:48.2571935Z #define __WINT_WIDTH__ 32 2025-05-07T19:44:48.2572194Z #define __INT_LEAST64_WIDTH__ 64 2025-05-07T19:44:48.2572496Z #define __LDBL_MAX_EXP__ 16384 2025-05-07T19:44:48.2572766Z #define __FLT32X_MAX_10_EXP__ 308 2025-05-07T19:44:48.2573063Z #define __SIZEOF_INT128__ 16 2025-05-07T19:44:48.2573349Z #define __FLT64X_IS_IEC_60559__ 2 2025-05-07T19:44:48.2573622Z #define __LDBL_MAX_10_EXP__ 4932 2025-05-07T19:44:48.2573910Z #define __ATOMIC_RELAXED 0 2025-05-07T19:44:48.2574257Z #define __DBL_EPSILON__ double(2.22044604925031308084726333618164062e-16L) 2025-05-07T19:44:48.2574740Z #define __FLT128_MIN__ 3.36210314311209350626267781732175260e-4932F128 2025-05-07T19:44:48.2575104Z #define _LP64 1 2025-05-07T19:44:48.2575345Z #define __UINT8_C(c) c 2025-05-07T19:44:48.2575644Z #define __FLT64_MAX_EXP__ 1024 2025-05-07T19:44:48.2575947Z #define __INT_LEAST32_TYPE__ int 2025-05-07T19:44:48.2576218Z #define __SIZEOF_WCHAR_T__ 4 2025-05-07T19:44:48.2576510Z #define __GNUC_PATCHLEVEL__ 0 2025-05-07T19:44:48.2576902Z #define __FLT128_NORM_MAX__ 1.18973149535723176508575932662800702e+4932F128 2025-05-07T19:44:48.2577370Z #define __FLT64_NORM_MAX__ 1.79769313486231570814527423731704357e+308F64 2025-05-07T19:44:48.2577773Z #define __FLT128_HAS_QUIET_NAN__ 1 2025-05-07T19:44:48.2578069Z #define __INTMAX_MAX__ 0x7fffffffffffffffL 2025-05-07T19:44:48.2578404Z #define __INT_FAST8_TYPE__ signed char 2025-05-07T19:44:48.2578712Z #define __cpp_namespace_attributes 201411L 2025-05-07T19:44:48.2579113Z #define __FLT64X_MIN__ 3.36210314311209350626267781732175260e-4932F64x 2025-05-07T19:44:48.2579511Z #define __STDCPP_THREADS__ 1 2025-05-07T19:44:48.2579777Z #define __GNUC_STDC_INLINE__ 1 2025-05-07T19:44:48.2580070Z #define __FLT64_HAS_DENORM__ 1 2025-05-07T19:44:48.2580413Z #define __FLT32_EPSILON__ 1.19209289550781250000000000000000000e-7F32 2025-05-07T19:44:48.2580819Z #define __DBL_DECIMAL_DIG__ 17 2025-05-07T19:44:48.2581081Z #define __STDC_UTF_32__ 1 2025-05-07T19:44:48.2581359Z #define __INT_FAST8_WIDTH__ 8 2025-05-07T19:44:48.2581618Z #define __FXSR__ 1 2025-05-07T19:44:48.2581959Z #define __FLT32X_MAX__ 1.79769313486231570814527423731704357e+308F32x 2025-05-07T19:44:48.2582426Z #define __DBL_NORM_MAX__ double(1.79769313486231570814527423731704357e+308L) 2025-05-07T19:44:48.2582864Z #define __BYTE_ORDER__ __ORDER_LITTLE_ENDIAN__ 2025-05-07T19:44:48.2583201Z #define __INTMAX_WIDTH__ 64 2025-05-07T19:44:48.2583540Z #define __cpp_runtime_arrays 198712L 2025-05-07T19:44:48.2584041Z #define __UINT64_TYPE__ long unsigned int 2025-05-07T19:44:48.2584414Z #define __UINT32_C(c) c ## U 2025-05-07T19:44:48.2584736Z #define __cpp_alias_templates 200704L 2025-05-07T19:44:48.2585130Z #define __FLT_DENORM_MIN__ 1.40129846432481707092372958328991613e-45F 2025-05-07T19:44:48.2585572Z #define __FLT128_IS_IEC_60559__ 2 2025-05-07T19:44:48.2585877Z #define __INT8_MAX__ 0x7f 2025-05-07T19:44:48.2586171Z #define __LONG_WIDTH__ 64 2025-05-07T19:44:48.2586454Z #define __PIC__ 2 2025-05-07T19:44:48.2586726Z #define __UINT_FAST32_TYPE__ long unsigned int 2025-05-07T19:44:48.2587185Z #define __FLT32X_NORM_MAX__ 1.79769313486231570814527423731704357e+308F32x 2025-05-07T19:44:48.2587614Z #define __CHAR32_TYPE__ unsigned int 2025-05-07T19:44:48.2588001Z #define __FLT_MAX__ 3.40282346638528859811704183484516925e+38F 2025-05-07T19:44:48.2588383Z #define __cpp_constexpr 201603L 2025-05-07T19:44:48.2588693Z #define __SSE2__ 1 2025-05-07T19:44:48.2588949Z #define __cpp_deduction_guides 201703L 2025-05-07T19:44:48.2589306Z #define __INT32_TYPE__ int 2025-05-07T19:44:48.2589642Z #define __SIZEOF_DOUBLE__ 8 2025-05-07T19:44:48.2589935Z #define __cpp_exceptions 199711L 2025-05-07T19:44:48.2590274Z #define __FLT_MIN_10_EXP__ (-37) 2025-05-07T19:44:48.2590649Z #define __FLT64_MIN__ 2.22507385850720138309023271733240406e-308F64 2025-05-07T19:44:48.2591164Z #define __INT_LEAST32_WIDTH__ 32 2025-05-07T19:44:48.2591465Z #define __INTMAX_TYPE__ long int 2025-05-07T19:44:48.2591792Z #define __DEC128_MAX_EXP__ 6145 2025-05-07T19:44:48.2592125Z #define __FLT32X_HAS_QUIET_NAN__ 1 2025-05-07T19:44:48.2592432Z #define __ATOMIC_CONSUME 1 2025-05-07T19:44:48.2592738Z #define __GNUC_MINOR__ 4 2025-05-07T19:44:48.2593024Z #define __GLIBCXX_TYPE_INT_N_0 __int128 2025-05-07T19:44:48.2593379Z #define __INT_FAST16_WIDTH__ 64 2025-05-07T19:44:48.2593700Z #define __UINTMAX_MAX__ 0xffffffffffffffffUL 2025-05-07T19:44:48.2594059Z #define __PIE__ 2 2025-05-07T19:44:48.2594421Z #define __FLT32X_DENORM_MIN__ 4.94065645841246544176568792868221372e-324F32x 2025-05-07T19:44:48.2594890Z #define __cpp_template_template_args 201611L 2025-05-07T19:44:48.2595226Z #define __DBL_MAX_10_EXP__ 308 2025-05-07T19:44:48.2595626Z #define __LDBL_DENORM_MIN__ 3.64519953188247460252840593361941982e-4951L 2025-05-07T19:44:48.2596151Z #define __INT16_C(c) c 2025-05-07T19:44:48.2596436Z #define __STDC__ 1 2025-05-07T19:44:48.2596695Z #define __FLT32X_DIG__ 15 2025-05-07T19:44:48.2596951Z #define __PTRDIFF_TYPE__ long int 2025-05-07T19:44:48.2597258Z #define __ATOMIC_SEQ_CST 5 2025-05-07T19:44:48.2597516Z #define __FLT32X_MIN_10_EXP__ (-307) 2025-05-07T19:44:48.2597839Z #define __UINTPTR_TYPE__ long unsigned int 2025-05-07T19:44:48.2598188Z #define __DEC64_SUBNORMAL_MIN__ 0.000000000000001E-383DD 2025-05-07T19:44:48.2598550Z #define __DEC128_MANT_DIG__ 34 2025-05-07T19:44:48.2598825Z #define __LDBL_MIN_10_EXP__ (-4931) 2025-05-07T19:44:48.2599148Z #define __cpp_generic_lambdas 201304L 2025-05-07T19:44:48.2599462Z #define __SSE_MATH__ 1 2025-05-07T19:44:48.2599709Z #define __SIZEOF_LONG_LONG__ 8 2025-05-07T19:44:48.2600020Z #define __cpp_user_defined_literals 200809L 2025-05-07T19:44:48.2600332Z #define __FLT128_DECIMAL_DIG__ 36 2025-05-07T19:44:48.2600643Z #define __GCC_ATOMIC_LLONG_LOCK_FREE 2 2025-05-07T19:44:48.2600938Z #define __FLT32_HAS_QUIET_NAN__ 1 2025-05-07T19:44:48.2601247Z #define __FLT_DECIMAL_DIG__ 9 2025-05-07T19:44:48.2601559Z #define __UINT_FAST16_MAX__ 0xffffffffffffffffUL 2025-05-07T19:44:48.2601984Z #define __LDBL_NORM_MAX__ 1.18973149535723176502126385303097021e+4932L 2025-05-07T19:44:48.2602388Z #define __GCC_ATOMIC_SHORT_LOCK_FREE 2 2025-05-07T19:44:48.2602694Z #define __UINT_FAST8_TYPE__ unsigned char 2025-05-07T19:44:48.2603019Z #define _GNU_SOURCE 1 2025-05-07T19:44:48.2603273Z #define __cpp_init_captures 201304L 2025-05-07T19:44:48.2603585Z #define __ATOMIC_ACQ_REL 4 2025-05-07T19:44:48.2603840Z #define __ATOMIC_RELEASE 3 2025-05-07T19:44:48.2604031Z 2025-05-07T19:44:48.3057028Z 2025-05-07T19:44:48.3057623Z + conda run -n build_binary c++ --version 2025-05-07T19:44:48.3057951Z 2025-05-07T19:44:50.1045566Z c++ (conda-forge gcc 11.4.0-13) 11.4.0 2025-05-07T19:44:50.1046688Z Copyright (C) 2021 Free Software Foundation, Inc. 2025-05-07T19:44:50.1048058Z This is free software; see the source for copying conditions. There is NO 2025-05-07T19:44:50.1049779Z warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. 2025-05-07T19:44:50.1050838Z 2025-05-07T19:44:50.1050853Z 2025-05-07T19:44:50.1614756Z 2025-05-07T19:44:50.1616227Z [INFO] Printing the default version of the C standard used by the compiler ... 2025-05-07T19:44:50.1616825Z + conda run -n build_binary cc -dM -E - < /dev/null | grep __STDC_VERSION__ 2025-05-07T19:44:50.1617165Z 2025-05-07T19:44:52.0071970Z #define __STDC_VERSION__ 201710L 2025-05-07T19:44:52.0072605Z 2025-05-07T19:44:52.0073303Z [INFO] Printing the default version of the C++ standard used by the compiler ... 2025-05-07T19:44:52.0073900Z + conda run -n build_binary c++ -dM -E -x c++ - < /dev/null | grep __cplusplus 2025-05-07T19:44:52.0074222Z 2025-05-07T19:44:53.8783283Z #define __cplusplus 201703L 2025-05-07T19:44:53.8784197Z 2025-05-07T19:44:53.8784630Z [INSTALL] Successfully installed C/C++ compilers 2025-05-07T19:44:53.8849712Z ##[group]Run . $PRELUDE; install_build_tools $BUILD_ENV 2025-05-07T19:44:53.8850172Z . $PRELUDE; install_build_tools $BUILD_ENV 2025-05-07T19:44:53.8851291Z shell: bash --noprofile --norc -e -o pipefail {0} 2025-05-07T19:44:53.8851608Z env: 2025-05-07T19:44:53.8851837Z PRELUDE: .github/scripts/setup_env.bash 2025-05-07T19:44:53.8852121Z BUILD_ENV: build_binary 2025-05-07T19:44:53.8852361Z BUILD_TARGET: genai 2025-05-07T19:44:53.8852576Z BUILD_VARIANT: cuda 2025-05-07T19:44:53.8852806Z BUILD_CUDA_VERSION: 12.8.0 2025-05-07T19:44:53.8853039Z ##[endgroup] 2025-05-07T19:44:54.3447198Z ################################################################################ 2025-05-07T19:44:54.3448241Z # Install Build Tools 2025-05-07T19:44:54.3448932Z # 2025-05-07T19:44:54.3469953Z # [2025-05-07T19:44:54.346Z] + install_build_tools build_binary 2025-05-07T19:44:54.3470397Z ################################################################################ 2025-05-07T19:44:54.3470731Z 2025-05-07T19:44:54.3492213Z [EXEC] [ATTEMPT 0/3] + wget -q --timeout 1 pypi.org -O /dev/null 2025-05-07T19:44:54.4338240Z [CHECK] Network does not appear to be blocked. 2025-05-07T19:44:54.4340885Z [INSTALL] Installing build tools ... 2025-05-07T19:44:54.4368606Z [EXEC] [ATTEMPT 0/3] + conda install -n build_binary -c conda-forge --override-channels -y auditwheel bazel cmake>=3.30 hypothesis jinja2 make ncurses ninja openblas patchelf rhash scikit-build wheel pyyaml 2025-05-07T19:44:55.1465494Z Channels: 2025-05-07T19:44:55.1466138Z - conda-forge 2025-05-07T19:44:55.1466790Z Platform: linux-64 2025-05-07T19:44:58.1230980Z Collecting package metadata (repodata.json): - \ | / done 2025-05-07T19:45:01.5424489Z Solving environment: \ | / - done 2025-05-07T19:45:01.5947475Z 2025-05-07T19:45:01.5948575Z ## Package Plan ## 2025-05-07T19:45:01.5949060Z 2025-05-07T19:45:01.5949694Z environment location: /github/home/miniconda/envs/build_binary 2025-05-07T19:45:01.5950600Z 2025-05-07T19:45:01.5950921Z added / updated specs: 2025-05-07T19:45:01.5955400Z - auditwheel 2025-05-07T19:45:01.5956029Z - bazel 2025-05-07T19:45:01.5956707Z - cmake[version='>=3.30'] 2025-05-07T19:45:01.5957093Z - hypothesis 2025-05-07T19:45:01.5957332Z - jinja2 2025-05-07T19:45:01.5957542Z - make 2025-05-07T19:45:01.5957763Z - ncurses 2025-05-07T19:45:01.5957966Z - ninja 2025-05-07T19:45:01.5958180Z - openblas 2025-05-07T19:45:01.5958395Z - patchelf 2025-05-07T19:45:01.5958630Z - pyyaml 2025-05-07T19:45:01.5958835Z - rhash 2025-05-07T19:45:01.5959059Z - scikit-build 2025-05-07T19:45:01.5959278Z - wheel 2025-05-07T19:45:01.5959411Z 2025-05-07T19:45:01.5959416Z 2025-05-07T19:45:01.5959539Z The following packages will be downloaded: 2025-05-07T19:45:01.5959758Z 2025-05-07T19:45:01.5959894Z package | build 2025-05-07T19:45:01.5960218Z ---------------------------|----------------- 2025-05-07T19:45:01.5960632Z alsa-lib-1.2.14 | hb9d3cd8_0 553 KB conda-forge 2025-05-07T19:45:01.5961061Z attrs-25.3.0 | pyh71513ae_0 56 KB conda-forge 2025-05-07T19:45:01.5961522Z auditwheel-6.2.0 | pyha804496_1 40 KB conda-forge 2025-05-07T19:45:01.5961943Z bazel-7.5.0 | h96810dc_2 47.4 MB conda-forge 2025-05-07T19:45:01.5962364Z c-ares-1.34.5 | hb9d3cd8_0 202 KB conda-forge 2025-05-07T19:45:01.5962779Z cairo-1.18.4 | h3394656_0 955 KB conda-forge 2025-05-07T19:45:01.5963177Z click-8.1.8 | pyh707e725_0 83 KB conda-forge 2025-05-07T19:45:01.5963597Z cmake-4.0.2 | h74e3db0_0 19.4 MB conda-forge 2025-05-07T19:45:01.5964005Z distro-1.9.0 | pyhd8ed1ab_1 41 KB conda-forge 2025-05-07T19:45:01.5964763Z exceptiongroup-1.2.2 | pyhd8ed1ab_1 20 KB conda-forge 2025-05-07T19:45:01.5982932Z font-ttf-dejavu-sans-mono-2.37| hab24e00_0 388 KB conda-forge 2025-05-07T19:45:01.5984269Z font-ttf-inconsolata-3.000 | h77eed37_0 94 KB conda-forge 2025-05-07T19:45:01.5984852Z font-ttf-source-code-pro-2.038| h77eed37_0 684 KB conda-forge 2025-05-07T19:45:01.5985424Z font-ttf-ubuntu-0.83 | h77eed37_3 1.5 MB conda-forge 2025-05-07T19:45:01.5985918Z fontconfig-2.15.0 | h7e30c49_1 259 KB conda-forge 2025-05-07T19:45:01.5986463Z fonts-conda-ecosystem-1 | 0 4 KB conda-forge 2025-05-07T19:45:01.5986983Z fonts-conda-forge-1 | 0 4 KB conda-forge 2025-05-07T19:45:01.5987488Z freetype-2.13.3 | ha770c72_1 168 KB conda-forge 2025-05-07T19:45:01.5987970Z giflib-5.2.2 | hd590300_0 75 KB conda-forge 2025-05-07T19:45:01.5988431Z graphite2-1.3.13 | h59595ed_1003 95 KB conda-forge 2025-05-07T19:45:01.5988921Z harfbuzz-11.0.0 | h76408a6_0 1.6 MB conda-forge 2025-05-07T19:45:01.5989408Z hypothesis-6.131.14 | pyha770c72_0 348 KB conda-forge 2025-05-07T19:45:01.5989896Z icu-75.1 | he02047a_0 11.6 MB conda-forge 2025-05-07T19:45:01.5990432Z ijar-7.5.0 | h5888daf_0 114 KB conda-forge 2025-05-07T19:45:01.5990833Z jinja2-3.1.6 | pyhd8ed1ab_0 110 KB conda-forge 2025-05-07T19:45:01.5991289Z keyutils-1.6.1 | h166bdaf_0 115 KB conda-forge 2025-05-07T19:45:01.5991694Z krb5-1.21.3 | h659f571_0 1.3 MB conda-forge 2025-05-07T19:45:01.5992111Z lcms2-2.17 | h717163a_0 242 KB conda-forge 2025-05-07T19:45:01.5992507Z lerc-4.0.0 | h0aef613_1 258 KB conda-forge 2025-05-07T19:45:01.5992986Z libabseil-20250127.1 | cxx17_hbbce691_0 1.3 MB conda-forge 2025-05-07T19:45:01.5993469Z libcups-2.3.3 | h4637d8d_4 4.3 MB conda-forge 2025-05-07T19:45:01.5993893Z libcurl-8.13.0 | h332b0f4_0 428 KB conda-forge 2025-05-07T19:45:01.5994345Z libdeflate-1.23 | h86f0d12_0 71 KB conda-forge 2025-05-07T19:45:01.5994810Z libedit-3.1.20250104 | pl5321h7949ede_0 132 KB conda-forge 2025-05-07T19:45:01.5995276Z libev-4.33 | hd590300_2 110 KB conda-forge 2025-05-07T19:45:01.5995699Z libfreetype-2.13.3 | ha770c72_1 8 KB conda-forge 2025-05-07T19:45:01.5996179Z libfreetype6-2.13.3 | h48d6fc4_1 371 KB conda-forge 2025-05-07T19:45:01.5996638Z libgfortran-15.1.0 | h69a702a_2 34 KB conda-forge 2025-05-07T19:45:01.5997095Z libgfortran5-15.1.0 | hcea5267_2 1.5 MB conda-forge 2025-05-07T19:45:01.5997554Z libglib-2.84.0 | h2ff4ddf_0 3.8 MB conda-forge 2025-05-07T19:45:01.5997976Z libgrpc-1.71.0 | h8e591d7_1 7.6 MB conda-forge 2025-05-07T19:45:01.5998596Z libiconv-1.18 | h4ce23a2_1 696 KB conda-forge 2025-05-07T19:45:01.5999081Z libjpeg-turbo-3.1.0 | hb9d3cd8_0 614 KB conda-forge 2025-05-07T19:45:01.5999546Z liblzma-5.8.1 | hb9d3cd8_1 110 KB conda-forge 2025-05-07T19:45:01.6000031Z liblzma-devel-5.8.1 | hb9d3cd8_1 431 KB conda-forge 2025-05-07T19:45:01.6000609Z libnghttp2-1.64.0 | h161d5f1_0 632 KB conda-forge 2025-05-07T19:45:01.6001102Z libopenblas-0.3.29 |pthreads_h94d23a6_0 5.6 MB conda-forge 2025-05-07T19:45:01.6001554Z libpng-1.6.47 | h943b412_0 282 KB conda-forge 2025-05-07T19:45:01.6002138Z libprotobuf-5.29.3 | h501fc15_1 3.2 MB conda-forge 2025-05-07T19:45:01.6002625Z libre2-11-2024.07.02 | hba17884_3 205 KB conda-forge 2025-05-07T19:45:01.6003122Z libssh2-1.11.1 | hcf80075_0 298 KB conda-forge 2025-05-07T19:45:01.6003549Z libtiff-4.7.0 | hd9ff511_4 419 KB conda-forge 2025-05-07T19:45:01.6003958Z libuv-1.50.0 | hb9d3cd8_0 870 KB conda-forge 2025-05-07T19:45:01.6004416Z libwebp-base-1.5.0 | h851e524_0 420 KB conda-forge 2025-05-07T19:45:01.6004839Z libxcb-1.17.0 | h8a09558_0 387 KB conda-forge 2025-05-07T19:45:01.6005272Z libzlib-1.3.1 | hb9d3cd8_2 60 KB conda-forge 2025-05-07T19:45:01.6005704Z make-4.4.1 | hb9d3cd8_2 501 KB conda-forge 2025-05-07T19:45:01.6006138Z markupsafe-3.0.2 | py312h178313f_1 24 KB conda-forge 2025-05-07T19:45:01.6006593Z ncurses-6.5 | h2d0b736_3 871 KB conda-forge 2025-05-07T19:45:01.6007003Z ninja-1.12.1 | hff21bea_1 158 KB conda-forge 2025-05-07T19:45:01.6007462Z openblas-0.3.29 |pthreads_h6ec200e_0 5.8 MB conda-forge 2025-05-07T19:45:01.6007934Z openjdk-23.0.2 | h53dfc1b_2 181.4 MB conda-forge 2025-05-07T19:45:01.6008361Z packaging-25.0 | pyh29332c3_1 61 KB conda-forge 2025-05-07T19:45:01.6009212Z patchelf-0.18.0 | h3f2d84a_2 133 KB conda-forge 2025-05-07T19:45:01.6009656Z pcre2-10.44 | hc749103_2 934 KB conda-forge 2025-05-07T19:45:01.6010102Z pixman-0.46.0 | h29eaf8c_0 389 KB conda-forge 2025-05-07T19:45:01.6010555Z pthread-stubs-0.4 | hb9d3cd8_1002 8 KB conda-forge 2025-05-07T19:45:01.6011050Z pyelftools-0.32 | pyh707e725_1 146 KB conda-forge 2025-05-07T19:45:01.6011513Z pyyaml-6.0.2 | py312h178313f_2 202 KB conda-forge 2025-05-07T19:45:01.6011941Z re2-2024.07.02 | h9925aae_3 26 KB conda-forge 2025-05-07T19:45:01.6012371Z rhash-1.4.5 | hb9d3cd8_0 183 KB conda-forge 2025-05-07T19:45:01.6012817Z scikit-build-0.18.1 | pyhae55e72_2 114 KB conda-forge 2025-05-07T19:45:01.6013299Z singlejar-7.5.0 | h0e684df_1 122 KB conda-forge 2025-05-07T19:45:01.6013796Z sortedcontainers-2.4.0 | pyhd8ed1ab_1 28 KB conda-forge 2025-05-07T19:45:01.6014265Z sqlite-3.46.0 | h6d4b2fc_0 840 KB conda-forge 2025-05-07T19:45:01.6014699Z tk-8.6.13 |noxft_h4845f30_101 3.2 MB conda-forge 2025-05-07T19:45:01.6015120Z tomli-2.2.1 | pyhd8ed1ab_1 19 KB conda-forge 2025-05-07T19:45:01.6015556Z wheel-0.45.1 | pyhd8ed1ab_1 61 KB conda-forge 2025-05-07T19:45:01.6016002Z xorg-libice-1.1.2 | hb9d3cd8_0 57 KB conda-forge 2025-05-07T19:45:01.6016468Z xorg-libsm-1.2.6 | he73a12e_0 27 KB conda-forge 2025-05-07T19:45:01.6016940Z xorg-libx11-1.8.12 | h4f16b4b_0 816 KB conda-forge 2025-05-07T19:45:01.6017394Z xorg-libxau-1.0.12 | hb9d3cd8_0 14 KB conda-forge 2025-05-07T19:45:01.6017878Z xorg-libxdmcp-1.1.5 | hb9d3cd8_0 19 KB conda-forge 2025-05-07T19:45:01.6018346Z xorg-libxext-1.3.6 | hb9d3cd8_0 49 KB conda-forge 2025-05-07T19:45:01.6018834Z xorg-libxfixes-6.0.1 | hb9d3cd8_0 19 KB conda-forge 2025-05-07T19:45:01.6019296Z xorg-libxi-1.8.2 | hb9d3cd8_0 46 KB conda-forge 2025-05-07T19:45:01.6019930Z xorg-libxrandr-1.5.4 | hb9d3cd8_0 29 KB conda-forge 2025-05-07T19:45:01.6020442Z xorg-libxrender-0.9.12 | hb9d3cd8_0 32 KB conda-forge 2025-05-07T19:45:01.6021014Z xorg-libxt-1.3.1 | hb9d3cd8_0 371 KB conda-forge 2025-05-07T19:45:01.6021490Z xorg-libxtst-1.2.5 | hb9d3cd8_3 32 KB conda-forge 2025-05-07T19:45:01.6022140Z xz-5.8.1 | hbcc6ac9_1 23 KB conda-forge 2025-05-07T19:45:01.6022553Z xz-gpl-tools-5.8.1 | hbcc6ac9_1 33 KB conda-forge 2025-05-07T19:45:01.6022980Z xz-tools-5.8.1 | hb9d3cd8_1 94 KB conda-forge 2025-05-07T19:45:01.6023445Z yaml-0.2.5 | h7f98852_2 87 KB conda-forge 2025-05-07T19:45:01.6024026Z zlib-1.3.1 | hb9d3cd8_2 90 KB conda-forge 2025-05-07T19:45:01.6024522Z zstd-1.5.7 | hb8e6e7a_2 554 KB conda-forge 2025-05-07T19:45:01.6024933Z ------------------------------------------------------------ 2025-05-07T19:45:01.6025294Z Total: 318.8 MB 2025-05-07T19:45:01.6025540Z 2025-05-07T19:45:01.6025675Z The following NEW packages will be INSTALLED: 2025-05-07T19:45:01.6025908Z 2025-05-07T19:45:01.6026151Z alsa-lib conda-forge/linux-64::alsa-lib-1.2.14-hb9d3cd8_0 2025-05-07T19:45:01.6026604Z attrs conda-forge/noarch::attrs-25.3.0-pyh71513ae_0 2025-05-07T19:45:01.6027088Z auditwheel conda-forge/noarch::auditwheel-6.2.0-pyha804496_1 2025-05-07T19:45:01.6027549Z bazel conda-forge/linux-64::bazel-7.5.0-h96810dc_2 2025-05-07T19:45:01.6027998Z c-ares conda-forge/linux-64::c-ares-1.34.5-hb9d3cd8_0 2025-05-07T19:45:01.6028443Z cairo conda-forge/linux-64::cairo-1.18.4-h3394656_0 2025-05-07T19:45:01.6028869Z click conda-forge/noarch::click-8.1.8-pyh707e725_0 2025-05-07T19:45:01.6029318Z cmake conda-forge/linux-64::cmake-4.0.2-h74e3db0_0 2025-05-07T19:45:01.6029824Z distro conda-forge/noarch::distro-1.9.0-pyhd8ed1ab_1 2025-05-07T19:45:01.6030358Z exceptiongroup conda-forge/noarch::exceptiongroup-1.2.2-pyhd8ed1ab_1 2025-05-07T19:45:01.6031003Z font-ttf-dejavu-s~ conda-forge/noarch::font-ttf-dejavu-sans-mono-2.37-hab24e00_0 2025-05-07T19:45:01.6031642Z font-ttf-inconsol~ conda-forge/noarch::font-ttf-inconsolata-3.000-h77eed37_0 2025-05-07T19:45:01.6032288Z font-ttf-source-c~ conda-forge/noarch::font-ttf-source-code-pro-2.038-h77eed37_0 2025-05-07T19:45:01.6032889Z font-ttf-ubuntu conda-forge/noarch::font-ttf-ubuntu-0.83-h77eed37_3 2025-05-07T19:45:01.6033433Z fontconfig conda-forge/linux-64::fontconfig-2.15.0-h7e30c49_1 2025-05-07T19:45:01.6033967Z fonts-conda-ecosy~ conda-forge/noarch::fonts-conda-ecosystem-1-0 2025-05-07T19:45:01.6034479Z fonts-conda-forge conda-forge/noarch::fonts-conda-forge-1-0 2025-05-07T19:45:01.6034981Z freetype conda-forge/linux-64::freetype-2.13.3-ha770c72_1 2025-05-07T19:45:01.6035441Z giflib conda-forge/linux-64::giflib-5.2.2-hd590300_0 2025-05-07T19:45:01.6035934Z graphite2 conda-forge/linux-64::graphite2-1.3.13-h59595ed_1003 2025-05-07T19:45:01.6036552Z harfbuzz conda-forge/linux-64::harfbuzz-11.0.0-h76408a6_0 2025-05-07T19:45:01.6037030Z hypothesis conda-forge/noarch::hypothesis-6.131.14-pyha770c72_0 2025-05-07T19:45:01.6037497Z icu conda-forge/linux-64::icu-75.1-he02047a_0 2025-05-07T19:45:01.6037893Z ijar conda-forge/linux-64::ijar-7.5.0-h5888daf_0 2025-05-07T19:45:01.6038335Z jinja2 conda-forge/noarch::jinja2-3.1.6-pyhd8ed1ab_0 2025-05-07T19:45:01.6038780Z keyutils conda-forge/linux-64::keyutils-1.6.1-h166bdaf_0 2025-05-07T19:45:01.6039231Z krb5 conda-forge/linux-64::krb5-1.21.3-h659f571_0 2025-05-07T19:45:01.6039783Z lcms2 conda-forge/linux-64::lcms2-2.17-h717163a_0 2025-05-07T19:45:01.6040185Z lerc conda-forge/linux-64::lerc-4.0.0-h0aef613_1 2025-05-07T19:45:01.6040738Z libabseil conda-forge/linux-64::libabseil-20250127.1-cxx17_hbbce691_0 2025-05-07T19:45:01.6041239Z libcups conda-forge/linux-64::libcups-2.3.3-h4637d8d_4 2025-05-07T19:45:01.6041709Z libcurl conda-forge/linux-64::libcurl-8.13.0-h332b0f4_0 2025-05-07T19:45:01.6042191Z libdeflate conda-forge/linux-64::libdeflate-1.23-h86f0d12_0 2025-05-07T19:45:01.6042691Z libedit conda-forge/linux-64::libedit-3.1.20250104-pl5321h7949ede_0 2025-05-07T19:45:01.6043179Z libev conda-forge/linux-64::libev-4.33-hd590300_2 2025-05-07T19:45:01.6043642Z libfreetype conda-forge/linux-64::libfreetype-2.13.3-ha770c72_1 2025-05-07T19:45:01.6044186Z libfreetype6 conda-forge/linux-64::libfreetype6-2.13.3-h48d6fc4_1 2025-05-07T19:45:01.6044905Z libgfortran conda-forge/linux-64::libgfortran-15.1.0-h69a702a_2 2025-05-07T19:45:01.6045426Z libgfortran5 conda-forge/linux-64::libgfortran5-15.1.0-hcea5267_2 2025-05-07T19:45:01.6045946Z libglib conda-forge/linux-64::libglib-2.84.0-h2ff4ddf_0 2025-05-07T19:45:01.6046408Z libgrpc conda-forge/linux-64::libgrpc-1.71.0-h8e591d7_1 2025-05-07T19:45:01.6047105Z libiconv conda-forge/linux-64::libiconv-1.18-h4ce23a2_1 2025-05-07T19:45:01.6047813Z libjpeg-turbo conda-forge/linux-64::libjpeg-turbo-3.1.0-hb9d3cd8_0 2025-05-07T19:45:01.6048308Z liblzma conda-forge/linux-64::liblzma-5.8.1-hb9d3cd8_1 2025-05-07T19:45:01.6048825Z liblzma-devel conda-forge/linux-64::liblzma-devel-5.8.1-hb9d3cd8_1 2025-05-07T19:45:01.6049345Z libnghttp2 conda-forge/linux-64::libnghttp2-1.64.0-h161d5f1_0 2025-05-07T19:45:01.6049906Z libopenblas conda-forge/linux-64::libopenblas-0.3.29-pthreads_h94d23a6_0 2025-05-07T19:45:01.6050447Z libpng conda-forge/linux-64::libpng-1.6.47-h943b412_0 2025-05-07T19:45:01.6050922Z libprotobuf conda-forge/linux-64::libprotobuf-5.29.3-h501fc15_1 2025-05-07T19:45:01.6051445Z libre2-11 conda-forge/linux-64::libre2-11-2024.07.02-hba17884_3 2025-05-07T19:45:01.6051914Z libssh2 conda-forge/linux-64::libssh2-1.11.1-hcf80075_0 2025-05-07T19:45:01.6052381Z libtiff conda-forge/linux-64::libtiff-4.7.0-hd9ff511_4 2025-05-07T19:45:01.6052834Z libuv conda-forge/linux-64::libuv-1.50.0-hb9d3cd8_0 2025-05-07T19:45:01.6053306Z libwebp-base conda-forge/linux-64::libwebp-base-1.5.0-h851e524_0 2025-05-07T19:45:01.6053807Z libxcb conda-forge/linux-64::libxcb-1.17.0-h8a09558_0 2025-05-07T19:45:01.6054231Z make conda-forge/linux-64::make-4.4.1-hb9d3cd8_2 2025-05-07T19:45:01.6054836Z markupsafe conda-forge/linux-64::markupsafe-3.0.2-py312h178313f_1 2025-05-07T19:45:01.6055328Z ninja conda-forge/linux-64::ninja-1.12.1-hff21bea_1 2025-05-07T19:45:01.6055806Z openblas conda-forge/linux-64::openblas-0.3.29-pthreads_h6ec200e_0 2025-05-07T19:45:01.6056319Z openjdk conda-forge/linux-64::openjdk-23.0.2-h53dfc1b_2 2025-05-07T19:45:01.6056774Z packaging conda-forge/noarch::packaging-25.0-pyh29332c3_1 2025-05-07T19:45:01.6057253Z patchelf conda-forge/linux-64::patchelf-0.18.0-h3f2d84a_2 2025-05-07T19:45:01.6057684Z pcre2 conda-forge/linux-64::pcre2-10.44-hc749103_2 2025-05-07T19:45:01.6058129Z pixman conda-forge/linux-64::pixman-0.46.0-h29eaf8c_0 2025-05-07T19:45:01.6058629Z pthread-stubs conda-forge/linux-64::pthread-stubs-0.4-hb9d3cd8_1002 2025-05-07T19:45:01.6059139Z pyelftools conda-forge/noarch::pyelftools-0.32-pyh707e725_1 2025-05-07T19:45:01.6059620Z pyyaml conda-forge/linux-64::pyyaml-6.0.2-py312h178313f_2 2025-05-07T19:45:01.6060048Z re2 conda-forge/linux-64::re2-2024.07.02-h9925aae_3 2025-05-07T19:45:01.6060550Z rhash conda-forge/linux-64::rhash-1.4.5-hb9d3cd8_0 2025-05-07T19:45:01.6061043Z scikit-build conda-forge/noarch::scikit-build-0.18.1-pyhae55e72_2 2025-05-07T19:45:01.6063060Z singlejar conda-forge/linux-64::singlejar-7.5.0-h0e684df_1 2025-05-07T19:45:01.6063904Z sortedcontainers conda-forge/noarch::sortedcontainers-2.4.0-pyhd8ed1ab_1 2025-05-07T19:45:01.6064432Z tomli conda-forge/noarch::tomli-2.2.1-pyhd8ed1ab_1 2025-05-07T19:45:01.6064928Z xorg-libice conda-forge/linux-64::xorg-libice-1.1.2-hb9d3cd8_0 2025-05-07T19:45:01.6065445Z xorg-libsm conda-forge/linux-64::xorg-libsm-1.2.6-he73a12e_0 2025-05-07T19:45:01.6065939Z xorg-libx11 conda-forge/linux-64::xorg-libx11-1.8.12-h4f16b4b_0 2025-05-07T19:45:01.6066460Z xorg-libxau conda-forge/linux-64::xorg-libxau-1.0.12-hb9d3cd8_0 2025-05-07T19:45:01.6066981Z xorg-libxdmcp conda-forge/linux-64::xorg-libxdmcp-1.1.5-hb9d3cd8_0 2025-05-07T19:45:01.6067538Z xorg-libxext conda-forge/linux-64::xorg-libxext-1.3.6-hb9d3cd8_0 2025-05-07T19:45:01.6068090Z xorg-libxfixes conda-forge/linux-64::xorg-libxfixes-6.0.1-hb9d3cd8_0 2025-05-07T19:45:01.6068613Z xorg-libxi conda-forge/linux-64::xorg-libxi-1.8.2-hb9d3cd8_0 2025-05-07T19:45:01.6069148Z xorg-libxrandr conda-forge/linux-64::xorg-libxrandr-1.5.4-hb9d3cd8_0 2025-05-07T19:45:01.6069717Z xorg-libxrender conda-forge/linux-64::xorg-libxrender-0.9.12-hb9d3cd8_0 2025-05-07T19:45:01.6070268Z xorg-libxt conda-forge/linux-64::xorg-libxt-1.3.1-hb9d3cd8_0 2025-05-07T19:45:01.6070788Z xorg-libxtst conda-forge/linux-64::xorg-libxtst-1.2.5-hb9d3cd8_3 2025-05-07T19:45:01.6071303Z xz-gpl-tools conda-forge/linux-64::xz-gpl-tools-5.8.1-hbcc6ac9_1 2025-05-07T19:45:01.6071808Z xz-tools conda-forge/linux-64::xz-tools-5.8.1-hb9d3cd8_1 2025-05-07T19:45:01.6072240Z yaml conda-forge/linux-64::yaml-0.2.5-h7f98852_2 2025-05-07T19:45:01.6072666Z zstd conda-forge/linux-64::zstd-1.5.7-hb8e6e7a_2 2025-05-07T19:45:01.6072930Z 2025-05-07T19:45:01.6073071Z The following packages will be UPDATED: 2025-05-07T19:45:01.6073292Z 2025-05-07T19:45:01.6073449Z libzlib 1.2.13-h4ab18f5_6 --> 1.3.1-hb9d3cd8_2 2025-05-07T19:45:01.6074004Z ncurses pkgs/main::ncurses-6.4-h6a678d5_0 --> conda-forge::ncurses-6.5-h2d0b736_3 2025-05-07T19:45:01.6074649Z sqlite pkgs/main::sqlite-3.45.3-h5eee18b_0 --> conda-forge::sqlite-3.46.0-h6d4b2fc_0 2025-05-07T19:45:01.6075339Z wheel pkgs/main/linux-64::wheel-0.45.1-py31~ --> conda-forge/noarch::wheel-0.45.1-pyhd8ed1ab_1 2025-05-07T19:45:01.6075974Z xz pkgs/main::xz-5.6.4-h5eee18b_1 --> conda-forge::xz-5.8.1-hbcc6ac9_1 2025-05-07T19:45:01.6076616Z zlib 1.2.13-h4ab18f5_6 --> 1.3.1-hb9d3cd8_2 2025-05-07T19:45:01.6076881Z 2025-05-07T19:45:01.6077106Z The following packages will be SUPERSEDED by a higher-priority channel: 2025-05-07T19:45:01.6077428Z 2025-05-07T19:45:01.6077677Z tk pkgs/main::tk-8.6.14-h39e8969_0 --> conda-forge::tk-8.6.13-noxft_h4845f30_101 2025-05-07T19:45:01.6078017Z 2025-05-07T19:45:01.6078063Z 2025-05-07T19:45:01.6078066Z 2025-05-07T19:45:01.6078213Z Downloading and Extracting Packages: ...working... 2025-05-07T19:45:01.6078608Z openjdk-23.0.2 | 181.4 MB | | 0% 2025-05-07T19:45:01.6078843Z 2025-05-07T19:45:01.6079163Z bazel-7.5.0 | 47.4 MB | | 0%  2025-05-07T19:45:01.6079415Z 2025-05-07T19:45:01.6079418Z 2025-05-07T19:45:01.6079616Z cmake-4.0.2 | 19.4 MB | | 0%  2025-05-07T19:45:01.6079855Z 2025-05-07T19:45:01.6079876Z 2025-05-07T19:45:01.6079879Z 2025-05-07T19:45:01.6080081Z icu-75.1 | 11.6 MB | | 0%  2025-05-07T19:45:01.6080313Z 2025-05-07T19:45:01.6080317Z 2025-05-07T19:45:01.6080422Z 2025-05-07T19:45:01.6080427Z 2025-05-07T19:45:01.6087474Z libgrpc-1.71.0 | 7.6 MB | | 0%  2025-05-07T19:45:01.6088405Z 2025-05-07T19:45:01.6088442Z 2025-05-07T19:45:01.6088452Z 2025-05-07T19:45:01.6088463Z 2025-05-07T19:45:01.6088473Z 2025-05-07T19:45:01.6089160Z openblas-0.3.29 | 5.8 MB | | 0%  2025-05-07T19:45:01.6089981Z 2025-05-07T19:45:01.6089992Z 2025-05-07T19:45:01.6090003Z 2025-05-07T19:45:01.6090013Z 2025-05-07T19:45:01.6090024Z 2025-05-07T19:45:01.6090034Z 2025-05-07T19:45:01.6090923Z libopenblas-0.3.29 | 5.6 MB | | 0%  2025-05-07T19:45:01.6091240Z 2025-05-07T19:45:01.6091243Z 2025-05-07T19:45:01.6091246Z 2025-05-07T19:45:01.6091250Z 2025-05-07T19:45:01.6091253Z 2025-05-07T19:45:01.6091256Z 2025-05-07T19:45:01.6091260Z 2025-05-07T19:45:01.6091517Z libcups-2.3.3 | 4.3 MB | | 0%  2025-05-07T19:45:01.6091812Z 2025-05-07T19:45:01.6091820Z 2025-05-07T19:45:01.6091823Z 2025-05-07T19:45:01.6091826Z 2025-05-07T19:45:01.6091830Z 2025-05-07T19:45:01.6091833Z 2025-05-07T19:45:01.6091841Z 2025-05-07T19:45:01.6091844Z 2025-05-07T19:45:01.6092101Z libglib-2.84.0 | 3.8 MB | | 0%  2025-05-07T19:45:01.6092399Z 2025-05-07T19:45:01.6092403Z 2025-05-07T19:45:01.6092407Z 2025-05-07T19:45:01.6092410Z 2025-05-07T19:45:01.6092413Z 2025-05-07T19:45:01.6092417Z 2025-05-07T19:45:01.6092420Z 2025-05-07T19:45:01.6092424Z 2025-05-07T19:45:01.6092427Z 2025-05-07T19:45:01.6092832Z libprotobuf-5.29.3 | 3.2 MB | | 0%  2025-05-07T19:45:01.6093147Z 2025-05-07T19:45:01.6093151Z 2025-05-07T19:45:01.6093166Z 2025-05-07T19:45:01.6093169Z 2025-05-07T19:45:01.6093173Z 2025-05-07T19:45:01.6093176Z 2025-05-07T19:45:01.6093179Z 2025-05-07T19:45:01.6093183Z 2025-05-07T19:45:01.6093186Z 2025-05-07T19:45:01.6093189Z 2025-05-07T19:45:01.6093867Z tk-8.6.13 | 3.2 MB | | 0%  2025-05-07T19:45:01.6094147Z 2025-05-07T19:45:01.6094151Z 2025-05-07T19:45:01.6094170Z 2025-05-07T19:45:01.6094174Z 2025-05-07T19:45:01.6094177Z 2025-05-07T19:45:01.6094181Z 2025-05-07T19:45:01.6094185Z 2025-05-07T19:45:01.6094188Z 2025-05-07T19:45:01.6094191Z 2025-05-07T19:45:01.6094195Z 2025-05-07T19:45:01.6094199Z 2025-05-07T19:45:01.6094924Z harfbuzz-11.0.0 | 1.6 MB | | 0%  2025-05-07T19:45:01.6095232Z 2025-05-07T19:45:01.6095248Z 2025-05-07T19:45:01.6095251Z 2025-05-07T19:45:01.6095255Z 2025-05-07T19:45:01.6095258Z 2025-05-07T19:45:01.6095262Z 2025-05-07T19:45:01.6095266Z 2025-05-07T19:45:01.6095270Z 2025-05-07T19:45:01.6095273Z 2025-05-07T19:45:01.6095277Z 2025-05-07T19:45:01.6095280Z 2025-05-07T19:45:01.6095284Z 2025-05-07T19:45:01.6096133Z font-ttf-ubuntu-0.83 | 1.5 MB | | 0%  2025-05-07T19:45:01.6096451Z 2025-05-07T19:45:01.6096459Z 2025-05-07T19:45:01.6096463Z 2025-05-07T19:45:01.6096467Z 2025-05-07T19:45:01.6096480Z 2025-05-07T19:45:01.6096484Z 2025-05-07T19:45:01.6096491Z 2025-05-07T19:45:01.6096494Z 2025-05-07T19:45:01.6096498Z 2025-05-07T19:45:01.6096501Z 2025-05-07T19:45:01.6096505Z 2025-05-07T19:45:01.6096525Z 2025-05-07T19:45:01.6096529Z 2025-05-07T19:45:01.6097310Z libgfortran5-15.1.0 | 1.5 MB | | 0%  2025-05-07T19:45:01.6097625Z 2025-05-07T19:45:01.6097629Z 2025-05-07T19:45:01.6097632Z 2025-05-07T19:45:01.6097648Z 2025-05-07T19:45:01.6097651Z 2025-05-07T19:45:01.6097671Z 2025-05-07T19:45:01.6097674Z 2025-05-07T19:45:01.6097677Z 2025-05-07T19:45:01.6097681Z 2025-05-07T19:45:01.6097684Z 2025-05-07T19:45:01.6097687Z 2025-05-07T19:45:01.6097691Z 2025-05-07T19:45:01.6097694Z 2025-05-07T19:45:01.6097698Z 2025-05-07T19:45:01.6098133Z krb5-1.21.3 | 1.3 MB | | 0%  2025-05-07T19:45:01.6098492Z 2025-05-07T19:45:01.6098496Z 2025-05-07T19:45:01.6098513Z 2025-05-07T19:45:01.6098517Z 2025-05-07T19:45:01.6098520Z 2025-05-07T19:45:01.6098574Z 2025-05-07T19:45:01.6098578Z 2025-05-07T19:45:01.6098581Z 2025-05-07T19:45:01.6098584Z 2025-05-07T19:45:01.6098588Z 2025-05-07T19:45:01.6098591Z 2025-05-07T19:45:01.6098595Z 2025-05-07T19:45:01.6098598Z 2025-05-07T19:45:01.6098602Z 2025-05-07T19:45:01.6098605Z 2025-05-07T19:45:01.6099228Z libabseil-20250127.1 | 1.3 MB | | 0%  2025-05-07T19:45:01.6099571Z 2025-05-07T19:45:01.6099574Z 2025-05-07T19:45:01.6099578Z 2025-05-07T19:45:01.6099582Z 2025-05-07T19:45:01.6099585Z 2025-05-07T19:45:01.6099601Z 2025-05-07T19:45:01.6099604Z 2025-05-07T19:45:01.6099607Z 2025-05-07T19:45:01.6099611Z 2025-05-07T19:45:01.6099614Z 2025-05-07T19:45:01.6099617Z 2025-05-07T19:45:01.6099621Z 2025-05-07T19:45:01.6099624Z 2025-05-07T19:45:01.6099628Z 2025-05-07T19:45:01.6099656Z 2025-05-07T19:45:01.6099663Z 2025-05-07T19:45:01.6100200Z cairo-1.18.4 | 955 KB | | 0%  2025-05-07T19:45:01.6100503Z 2025-05-07T19:45:01.6100519Z 2025-05-07T19:45:01.6100523Z 2025-05-07T19:45:01.6100526Z 2025-05-07T19:45:01.6100530Z 2025-05-07T19:45:01.6100556Z 2025-05-07T19:45:01.6100559Z 2025-05-07T19:45:01.6100563Z 2025-05-07T19:45:01.6100566Z 2025-05-07T19:45:01.6100570Z 2025-05-07T19:45:01.6100573Z 2025-05-07T19:45:01.6100576Z 2025-05-07T19:45:01.6100580Z 2025-05-07T19:45:01.6100583Z 2025-05-07T19:45:01.6100586Z 2025-05-07T19:45:01.6100590Z 2025-05-07T19:45:01.6100593Z 2025-05-07T19:45:01.6101334Z pcre2-10.44 | 934 KB | | 0%  2025-05-07T19:45:01.6101665Z 2025-05-07T19:45:01.6101668Z 2025-05-07T19:45:01.6101672Z 2025-05-07T19:45:01.6101686Z 2025-05-07T19:45:01.6101690Z 2025-05-07T19:45:01.6101693Z 2025-05-07T19:45:01.6101697Z 2025-05-07T19:45:01.6101704Z 2025-05-07T19:45:01.6101708Z 2025-05-07T19:45:01.6101711Z 2025-05-07T19:45:01.6101715Z 2025-05-07T19:45:01.6101718Z 2025-05-07T19:45:01.6101725Z 2025-05-07T19:45:01.6101729Z 2025-05-07T19:45:01.6101732Z 2025-05-07T19:45:01.6101735Z 2025-05-07T19:45:01.6101739Z 2025-05-07T19:45:01.6101742Z 2025-05-07T19:45:01.6102596Z ncurses-6.5 | 871 KB | | 0%  2025-05-07T19:45:01.6102921Z 2025-05-07T19:45:01.6102924Z 2025-05-07T19:45:01.6102928Z 2025-05-07T19:45:01.6102943Z 2025-05-07T19:45:01.6102947Z 2025-05-07T19:45:01.6102950Z 2025-05-07T19:45:01.6102954Z 2025-05-07T19:45:01.6102957Z 2025-05-07T19:45:01.6102984Z 2025-05-07T19:45:01.6102987Z 2025-05-07T19:45:01.6102991Z 2025-05-07T19:45:01.6102994Z 2025-05-07T19:45:01.6102997Z 2025-05-07T19:45:01.6103001Z 2025-05-07T19:45:01.6103004Z 2025-05-07T19:45:01.6103007Z 2025-05-07T19:45:01.6103010Z 2025-05-07T19:45:01.6103014Z 2025-05-07T19:45:01.6103022Z 2025-05-07T19:45:01.7837311Z ... (more hidden) ... 2025-05-07T19:45:01.7838245Z 2025-05-07T19:45:01.7838292Z 2025-05-07T19:45:01.7838304Z 2025-05-07T19:45:01.7838315Z 2025-05-07T19:45:01.7876326Z libgrpc-1.71.0 | 7.6 MB | | 0%  2025-05-07T19:45:01.7877173Z 2025-05-07T19:45:01.7878488Z 2025-05-07T19:45:01.8837037Z cmake-4.0.2 | 19.4 MB | | 0%  2025-05-07T19:45:01.8837849Z 2025-05-07T19:45:01.8837884Z 2025-05-07T19:45:01.8837895Z 2025-05-07T19:45:01.8837905Z 2025-05-07T19:45:01.8891232Z libgrpc-1.71.0 | 7.6 MB | ##8 | 28%  2025-05-07T19:45:01.8891874Z 2025-05-07T19:45:01.8891890Z 2025-05-07T19:45:01.8969828Z cmake-4.0.2 | 19.4 MB | 7 | 7%  2025-05-07T19:45:01.9068551Z 2025-05-07T19:45:01.9069513Z bazel-7.5.0 | 47.4 MB | | 0%  2025-05-07T19:45:01.9427389Z openjdk-23.0.2 | 181.4 MB | | 0% 2025-05-07T19:45:01.9427681Z 2025-05-07T19:45:01.9427685Z 2025-05-07T19:45:01.9427689Z 2025-05-07T19:45:01.9845796Z icu-75.1 | 11.6 MB | | 0%  2025-05-07T19:45:01.9846310Z 2025-05-07T19:45:01.9846315Z 2025-05-07T19:45:01.9846319Z 2025-05-07T19:45:01.9846323Z 2025-05-07T19:45:01.9888688Z libgrpc-1.71.0 | 7.6 MB | ####7 | 47%  2025-05-07T19:45:01.9888985Z 2025-05-07T19:45:01.9888990Z 2025-05-07T19:45:01.9970419Z cmake-4.0.2 | 19.4 MB | ###5 | 36%  2025-05-07T19:45:01.9970767Z 2025-05-07T19:45:02.0069206Z bazel-7.5.0 | 47.4 MB | #4 | 14%  2025-05-07T19:45:02.0428310Z openjdk-23.0.2 | 181.4 MB | 2 | 3% 2025-05-07T19:45:02.0428592Z 2025-05-07T19:45:02.0428597Z 2025-05-07T19:45:02.0428837Z 2025-05-07T19:45:02.0846197Z icu-75.1 | 11.6 MB | #####9 | 60%  2025-05-07T19:45:02.0846506Z 2025-05-07T19:45:02.0846511Z 2025-05-07T19:45:02.0846648Z 2025-05-07T19:45:02.0846652Z 2025-05-07T19:45:02.0971742Z libgrpc-1.71.0 | 7.6 MB | ######## | 81%  2025-05-07T19:45:02.0972346Z 2025-05-07T19:45:02.1069928Z bazel-7.5.0 | 47.4 MB | ##8 | 29%  2025-05-07T19:45:02.1074023Z openjdk-23.0.2 | 181.4 MB | 6 | 7% 2025-05-07T19:45:02.1074278Z 2025-05-07T19:45:02.1074642Z 2025-05-07T19:45:02.1530868Z cmake-4.0.2 | 19.4 MB | #####1 | 51%  2025-05-07T19:45:02.1531182Z 2025-05-07T19:45:02.1531187Z 2025-05-07T19:45:02.1531191Z 2025-05-07T19:45:02.1692372Z icu-75.1 | 11.6 MB | #########4 | 95%  2025-05-07T19:45:02.1693163Z 2025-05-07T19:45:02.1693179Z 2025-05-07T19:45:02.1693190Z 2025-05-07T19:45:02.1693201Z 2025-05-07T19:45:02.1972480Z libgrpc-1.71.0 | 7.6 MB | ########## | 100%  2025-05-07T19:45:02.1972792Z 2025-05-07T19:45:02.2071927Z bazel-7.5.0 | 47.4 MB | ####4 | 44%  2025-05-07T19:45:02.2120381Z openjdk-23.0.2 | 181.4 MB | #2 | 12% 2025-05-07T19:45:02.2120741Z 2025-05-07T19:45:02.2120832Z 2025-05-07T19:45:02.2281669Z cmake-4.0.2 | 19.4 MB | #######4 | 75%  2025-05-07T19:45:02.2282241Z 2025-05-07T19:45:02.2282252Z 2025-05-07T19:45:02.2282259Z 2025-05-07T19:45:02.2282265Z 2025-05-07T19:45:02.2282271Z 2025-05-07T19:45:02.2975130Z openblas-0.3.29 | 5.8 MB | | 0%  2025-05-07T19:45:02.2975547Z 2025-05-07T19:45:02.3300466Z bazel-7.5.0 | 47.4 MB | #####7 | 58%  2025-05-07T19:45:02.3587905Z openjdk-23.0.2 | 181.4 MB | #5 | 16% 2025-05-07T19:45:02.3588206Z 2025-05-07T19:45:02.3588211Z 2025-05-07T19:45:02.3634043Z cmake-4.0.2 | 19.4 MB | #########2 | 92%  2025-05-07T19:45:02.3634339Z 2025-05-07T19:45:02.3634344Z 2025-05-07T19:45:02.3634348Z 2025-05-07T19:45:02.3976337Z icu-75.1 | 11.6 MB | ########## | 100%  2025-05-07T19:45:02.3976655Z 2025-05-07T19:45:02.4013185Z bazel-7.5.0 | 47.4 MB | #######3 | 74%  2025-05-07T19:45:02.4013463Z 2025-05-07T19:45:02.4013486Z 2025-05-07T19:45:02.4013506Z 2025-05-07T19:45:02.4013510Z 2025-05-07T19:45:02.4013513Z 2025-05-07T19:45:02.4013778Z openblas-0.3.29 | 5.8 MB | ########## | 100%  2025-05-07T19:45:02.4014061Z 2025-05-07T19:45:02.4014066Z 2025-05-07T19:45:02.4014069Z 2025-05-07T19:45:02.4014072Z 2025-05-07T19:45:02.4014076Z 2025-05-07T19:45:02.4310654Z openblas-0.3.29 | 5.8 MB | ########## | 100%  2025-05-07T19:45:02.4311562Z 2025-05-07T19:45:02.4311575Z 2025-05-07T19:45:02.4311586Z 2025-05-07T19:45:02.4311597Z 2025-05-07T19:45:02.4311607Z 2025-05-07T19:45:02.4311618Z 2025-05-07T19:45:02.4387154Z libopenblas-0.3.29 | 5.6 MB | | 0%  2025-05-07T19:45:02.4655614Z openjdk-23.0.2 | 181.4 MB | #9 | 19% 2025-05-07T19:45:02.4655999Z 2025-05-07T19:45:02.4656103Z 2025-05-07T19:45:02.4656107Z 2025-05-07T19:45:02.4656382Z 2025-05-07T19:45:02.4656387Z 2025-05-07T19:45:02.4656399Z 2025-05-07T19:45:02.4656403Z 2025-05-07T19:45:02.4987236Z libcups-2.3.3 | 4.3 MB | | 0%  2025-05-07T19:45:02.4987571Z 2025-05-07T19:45:02.5313538Z bazel-7.5.0 | 47.4 MB | ########7 | 88%  2025-05-07T19:45:02.5313803Z 2025-05-07T19:45:02.5313821Z 2025-05-07T19:45:02.5313825Z 2025-05-07T19:45:02.5313829Z 2025-05-07T19:45:02.5313832Z 2025-05-07T19:45:02.5313836Z 2025-05-07T19:45:02.5890159Z libopenblas-0.3.29 | 5.6 MB | #######3 | 73%  2025-05-07T19:45:02.6050442Z openjdk-23.0.2 | 181.4 MB | ##2 | 23% 2025-05-07T19:45:02.6051120Z 2025-05-07T19:45:02.6051133Z 2025-05-07T19:45:02.6051143Z 2025-05-07T19:45:02.6051151Z 2025-05-07T19:45:02.6051159Z 2025-05-07T19:45:02.6051164Z 2025-05-07T19:45:02.6051169Z 2025-05-07T19:45:02.6052001Z libcups-2.3.3 | 4.3 MB | ########## | 100%  2025-05-07T19:45:02.6052418Z 2025-05-07T19:45:02.6052423Z 2025-05-07T19:45:02.6052427Z 2025-05-07T19:45:02.6052433Z 2025-05-07T19:45:02.6052438Z 2025-05-07T19:45:02.6052465Z 2025-05-07T19:45:02.6052469Z 2025-05-07T19:45:02.6306467Z libcups-2.3.3 | 4.3 MB | ########## | 100%  2025-05-07T19:45:02.6306822Z 2025-05-07T19:45:02.6306841Z 2025-05-07T19:45:02.6306845Z 2025-05-07T19:45:02.6306849Z 2025-05-07T19:45:02.6306853Z 2025-05-07T19:45:02.6306857Z 2025-05-07T19:45:02.6515521Z libopenblas-0.3.29 | 5.6 MB | ########## | 100%  2025-05-07T19:45:02.6515894Z 2025-05-07T19:45:02.6515899Z 2025-05-07T19:45:02.6515905Z 2025-05-07T19:45:02.6515911Z 2025-05-07T19:45:02.6515918Z 2025-05-07T19:45:02.6515923Z 2025-05-07T19:45:02.6515928Z 2025-05-07T19:45:02.6515942Z 2025-05-07T19:45:02.6839642Z libglib-2.84.0 | 3.8 MB | | 0%  2025-05-07T19:45:02.6840049Z 2025-05-07T19:45:02.6840054Z 2025-05-07T19:45:02.6840059Z 2025-05-07T19:45:02.6840097Z 2025-05-07T19:45:02.6840101Z 2025-05-07T19:45:02.6840106Z 2025-05-07T19:45:02.6840111Z 2025-05-07T19:45:02.6840141Z 2025-05-07T19:45:02.6840144Z 2025-05-07T19:45:02.6963484Z libprotobuf-5.29.3 | 3.2 MB | | 0%  2025-05-07T19:45:02.6971979Z openjdk-23.0.2 | 181.4 MB | ##6 | 27% 2025-05-07T19:45:02.6972258Z 2025-05-07T19:45:02.6972733Z 2025-05-07T19:45:02.7287390Z cmake-4.0.2 | 19.4 MB | ########## | 100%  2025-05-07T19:45:02.7288214Z 2025-05-07T19:45:02.7288229Z 2025-05-07T19:45:02.7288241Z 2025-05-07T19:45:02.7288252Z 2025-05-07T19:45:02.7288263Z 2025-05-07T19:45:02.7288274Z 2025-05-07T19:45:02.7288285Z 2025-05-07T19:45:02.7288296Z 2025-05-07T19:45:02.7565011Z libglib-2.84.0 | 3.8 MB | ########## | 100%  2025-05-07T19:45:02.7565349Z 2025-05-07T19:45:02.7565355Z 2025-05-07T19:45:02.7565360Z 2025-05-07T19:45:02.7565365Z 2025-05-07T19:45:02.7565370Z 2025-05-07T19:45:02.7565396Z 2025-05-07T19:45:02.7565401Z 2025-05-07T19:45:02.7565405Z 2025-05-07T19:45:02.7565409Z 2025-05-07T19:45:02.7629101Z libprotobuf-5.29.3 | 3.2 MB | ########## | 100%  2025-05-07T19:45:02.7629482Z 2025-05-07T19:45:02.7629486Z 2025-05-07T19:45:02.7629490Z 2025-05-07T19:45:02.7629493Z 2025-05-07T19:45:02.7629497Z 2025-05-07T19:45:02.7629500Z 2025-05-07T19:45:02.7629504Z 2025-05-07T19:45:02.7629507Z 2025-05-07T19:45:02.7629511Z 2025-05-07T19:45:02.7629514Z 2025-05-07T19:45:02.7916896Z tk-8.6.13 | 3.2 MB | | 0%  2025-05-07T19:45:02.7917566Z 2025-05-07T19:45:02.7917579Z 2025-05-07T19:45:02.7917587Z 2025-05-07T19:45:02.7917593Z 2025-05-07T19:45:02.7917597Z 2025-05-07T19:45:02.7917603Z 2025-05-07T19:45:02.7917607Z 2025-05-07T19:45:02.7917625Z 2025-05-07T19:45:02.7917630Z 2025-05-07T19:45:02.7917635Z 2025-05-07T19:45:02.7917694Z 2025-05-07T19:45:02.8189463Z harfbuzz-11.0.0 | 1.6 MB | | 1%  2025-05-07T19:45:02.8189810Z 2025-05-07T19:45:02.8189815Z 2025-05-07T19:45:02.8189968Z 2025-05-07T19:45:02.8189996Z 2025-05-07T19:45:02.8190000Z 2025-05-07T19:45:02.8190005Z 2025-05-07T19:45:02.8190008Z 2025-05-07T19:45:02.8190012Z 2025-05-07T19:45:02.8190015Z 2025-05-07T19:45:02.8190019Z 2025-05-07T19:45:02.8190022Z 2025-05-07T19:45:02.8190026Z 2025-05-07T19:45:02.8380138Z font-ttf-ubuntu-0.83 | 1.5 MB | 1 | 1%  2025-05-07T19:45:02.8380476Z 2025-05-07T19:45:02.8380688Z 2025-05-07T19:45:02.8380702Z 2025-05-07T19:45:02.8380708Z 2025-05-07T19:45:02.8380714Z 2025-05-07T19:45:02.8380721Z 2025-05-07T19:45:02.8380728Z 2025-05-07T19:45:02.8380732Z 2025-05-07T19:45:02.8380737Z 2025-05-07T19:45:02.8380742Z 2025-05-07T19:45:02.8474913Z tk-8.6.13 | 3.2 MB | ########## | 100%  2025-05-07T19:45:02.8475295Z 2025-05-07T19:45:02.8475301Z 2025-05-07T19:45:02.8475346Z 2025-05-07T19:45:02.8475350Z 2025-05-07T19:45:02.8475355Z 2025-05-07T19:45:02.8475359Z 2025-05-07T19:45:02.8475364Z 2025-05-07T19:45:02.8475385Z 2025-05-07T19:45:02.8475388Z 2025-05-07T19:45:02.8475391Z 2025-05-07T19:45:02.8475395Z 2025-05-07T19:45:02.8567471Z harfbuzz-11.0.0 | 1.6 MB | ########## | 100%  2025-05-07T19:45:02.8567808Z 2025-05-07T19:45:02.8567815Z 2025-05-07T19:45:02.8567822Z 2025-05-07T19:45:02.8567828Z 2025-05-07T19:45:02.8567834Z 2025-05-07T19:45:02.8567839Z 2025-05-07T19:45:02.8567845Z 2025-05-07T19:45:02.8567850Z 2025-05-07T19:45:02.8567867Z 2025-05-07T19:45:02.8567872Z 2025-05-07T19:45:02.8567878Z 2025-05-07T19:45:02.8567883Z 2025-05-07T19:45:02.8625621Z font-ttf-ubuntu-0.83 | 1.5 MB | ########## | 100%  2025-05-07T19:45:02.8856244Z openjdk-23.0.2 | 181.4 MB | ##9 | 30% 2025-05-07T19:45:02.8856607Z 2025-05-07T19:45:02.8856790Z 2025-05-07T19:45:02.8856823Z 2025-05-07T19:45:02.8856829Z 2025-05-07T19:45:02.8856834Z 2025-05-07T19:45:02.8856839Z 2025-05-07T19:45:02.8856844Z 2025-05-07T19:45:02.8856865Z 2025-05-07T19:45:02.8856870Z 2025-05-07T19:45:02.8856874Z 2025-05-07T19:45:02.8856905Z 2025-05-07T19:45:02.8856910Z 2025-05-07T19:45:02.8856915Z 2025-05-07T19:45:02.9159283Z libgfortran5-15.1.0 | 1.5 MB | 1 | 1%  2025-05-07T19:45:02.9159642Z 2025-05-07T19:45:02.9159647Z 2025-05-07T19:45:02.9159650Z 2025-05-07T19:45:02.9159654Z 2025-05-07T19:45:02.9159674Z 2025-05-07T19:45:02.9159677Z 2025-05-07T19:45:02.9159681Z 2025-05-07T19:45:02.9159684Z 2025-05-07T19:45:02.9159687Z 2025-05-07T19:45:02.9159691Z 2025-05-07T19:45:02.9159694Z 2025-05-07T19:45:02.9159698Z 2025-05-07T19:45:02.9159701Z 2025-05-07T19:45:02.9159704Z 2025-05-07T19:45:02.9216417Z krb5-1.21.3 | 1.3 MB | 1 | 1%  2025-05-07T19:45:02.9217330Z 2025-05-07T19:45:02.9217373Z 2025-05-07T19:45:02.9217386Z 2025-05-07T19:45:02.9217397Z 2025-05-07T19:45:02.9217407Z 2025-05-07T19:45:02.9217418Z 2025-05-07T19:45:02.9217462Z 2025-05-07T19:45:02.9217474Z 2025-05-07T19:45:02.9217484Z 2025-05-07T19:45:02.9217494Z 2025-05-07T19:45:02.9217504Z 2025-05-07T19:45:02.9217515Z 2025-05-07T19:45:02.9217525Z 2025-05-07T19:45:02.9217535Z 2025-05-07T19:45:02.9217545Z 2025-05-07T19:45:02.9228147Z libabseil-20250127.1 | 1.3 MB | 1 | 1%  2025-05-07T19:45:02.9228496Z 2025-05-07T19:45:02.9228499Z 2025-05-07T19:45:02.9228503Z 2025-05-07T19:45:02.9231131Z 2025-05-07T19:45:02.9428303Z libgrpc-1.71.0 | 7.6 MB | ########## | 100%  2025-05-07T19:45:02.9429165Z 2025-05-07T19:45:02.9429180Z 2025-05-07T19:45:02.9429191Z 2025-05-07T19:45:02.9429218Z 2025-05-07T19:45:02.9429230Z 2025-05-07T19:45:02.9429240Z 2025-05-07T19:45:02.9429250Z 2025-05-07T19:45:02.9429261Z 2025-05-07T19:45:02.9429271Z 2025-05-07T19:45:02.9429689Z 2025-05-07T19:45:02.9429702Z 2025-05-07T19:45:02.9429713Z 2025-05-07T19:45:02.9429940Z 2025-05-07T19:45:02.9624966Z libgfortran5-15.1.0 | 1.5 MB | ########## | 100%  2025-05-07T19:45:02.9625850Z 2025-05-07T19:45:02.9625855Z 2025-05-07T19:45:02.9625858Z 2025-05-07T19:45:02.9625862Z 2025-05-07T19:45:02.9625865Z 2025-05-07T19:45:02.9625869Z 2025-05-07T19:45:02.9625872Z 2025-05-07T19:45:02.9625876Z 2025-05-07T19:45:02.9625879Z 2025-05-07T19:45:02.9625882Z 2025-05-07T19:45:02.9625886Z 2025-05-07T19:45:02.9625889Z 2025-05-07T19:45:02.9625893Z 2025-05-07T19:45:02.9625896Z 2025-05-07T19:45:02.9625900Z 2025-05-07T19:45:02.9626844Z libabseil-20250127.1 | 1.3 MB | ########## | 100%  2025-05-07T19:45:02.9636195Z openjdk-23.0.2 | 181.4 MB | ###3 | 34% 2025-05-07T19:45:02.9636952Z 2025-05-07T19:45:02.9636963Z 2025-05-07T19:45:02.9636974Z 2025-05-07T19:45:02.9636984Z 2025-05-07T19:45:02.9637012Z 2025-05-07T19:45:02.9637023Z 2025-05-07T19:45:02.9637033Z 2025-05-07T19:45:02.9637044Z 2025-05-07T19:45:02.9637054Z 2025-05-07T19:45:02.9637078Z 2025-05-07T19:45:02.9637088Z 2025-05-07T19:45:02.9637098Z 2025-05-07T19:45:02.9637108Z 2025-05-07T19:45:02.9637118Z 2025-05-07T19:45:03.0108578Z krb5-1.21.3 | 1.3 MB | ########## | 100%  2025-05-07T19:45:03.0109033Z 2025-05-07T19:45:03.0109038Z 2025-05-07T19:45:03.0109042Z 2025-05-07T19:45:03.0109045Z 2025-05-07T19:45:03.0109049Z 2025-05-07T19:45:03.0109052Z 2025-05-07T19:45:03.0109074Z 2025-05-07T19:45:03.0109078Z 2025-05-07T19:45:03.0109081Z 2025-05-07T19:45:03.0109085Z 2025-05-07T19:45:03.0109088Z 2025-05-07T19:45:03.0109092Z 2025-05-07T19:45:03.0109096Z 2025-05-07T19:45:03.0109099Z 2025-05-07T19:45:03.0109114Z 2025-05-07T19:45:03.0109117Z 2025-05-07T19:45:03.0111611Z cairo-1.18.4 | 955 KB | 1 | 2%  2025-05-07T19:45:03.0111926Z 2025-05-07T19:45:03.0111930Z 2025-05-07T19:45:03.0111941Z 2025-05-07T19:45:03.0111945Z 2025-05-07T19:45:03.0111948Z 2025-05-07T19:45:03.0234072Z openblas-0.3.29 | 5.8 MB | ########## | 100%  2025-05-07T19:45:03.0234369Z 2025-05-07T19:45:03.0234373Z 2025-05-07T19:45:03.0234377Z 2025-05-07T19:45:03.0234380Z 2025-05-07T19:45:03.0234384Z 2025-05-07T19:45:03.0234387Z 2025-05-07T19:45:03.0234391Z 2025-05-07T19:45:03.0234394Z 2025-05-07T19:45:03.0234397Z 2025-05-07T19:45:03.0234414Z 2025-05-07T19:45:03.0234417Z 2025-05-07T19:45:03.0234420Z 2025-05-07T19:45:03.0234424Z 2025-05-07T19:45:03.0234427Z 2025-05-07T19:45:03.0234431Z 2025-05-07T19:45:03.0234434Z 2025-05-07T19:45:03.0234437Z 2025-05-07T19:45:03.0392165Z pcre2-10.44 | 934 KB | 1 | 2%  2025-05-07T19:45:03.0392502Z 2025-05-07T19:45:03.0392507Z 2025-05-07T19:45:03.0392511Z 2025-05-07T19:45:03.0392515Z 2025-05-07T19:45:03.0392531Z 2025-05-07T19:45:03.0392535Z 2025-05-07T19:45:03.0392538Z 2025-05-07T19:45:03.0392542Z 2025-05-07T19:45:03.0392545Z 2025-05-07T19:45:03.0392556Z 2025-05-07T19:45:03.0392559Z 2025-05-07T19:45:03.0392562Z 2025-05-07T19:45:03.0392581Z 2025-05-07T19:45:03.0392585Z 2025-05-07T19:45:03.0392588Z 2025-05-07T19:45:03.0392592Z 2025-05-07T19:45:03.0392595Z 2025-05-07T19:45:03.0393146Z 2025-05-07T19:45:03.0467525Z ncurses-6.5 | 871 KB | 1 | 2%  2025-05-07T19:45:03.0468509Z 2025-05-07T19:45:03.0468523Z 2025-05-07T19:45:03.0468534Z 2025-05-07T19:45:03.0468544Z 2025-05-07T19:45:03.0468554Z 2025-05-07T19:45:03.0468565Z 2025-05-07T19:45:03.0468575Z 2025-05-07T19:45:03.0468585Z 2025-05-07T19:45:03.0468595Z 2025-05-07T19:45:03.0468605Z 2025-05-07T19:45:03.0468634Z 2025-05-07T19:45:03.0468645Z 2025-05-07T19:45:03.0468656Z 2025-05-07T19:45:03.0468666Z 2025-05-07T19:45:03.0468676Z 2025-05-07T19:45:03.0468686Z 2025-05-07T19:45:03.0605452Z cairo-1.18.4 | 955 KB | ########## | 100%  2025-05-07T19:45:03.0605998Z 2025-05-07T19:45:03.0606017Z 2025-05-07T19:45:03.0606021Z 2025-05-07T19:45:03.0606024Z 2025-05-07T19:45:03.0606028Z 2025-05-07T19:45:03.0606031Z 2025-05-07T19:45:03.0606035Z 2025-05-07T19:45:03.0606038Z 2025-05-07T19:45:03.0606041Z 2025-05-07T19:45:03.0606044Z 2025-05-07T19:45:03.0606048Z 2025-05-07T19:45:03.0606051Z 2025-05-07T19:45:03.0606054Z 2025-05-07T19:45:03.0606058Z 2025-05-07T19:45:03.0606061Z 2025-05-07T19:45:03.0606064Z 2025-05-07T19:45:03.0606067Z 2025-05-07T19:45:03.0644707Z pcre2-10.44 | 934 KB | ########## | 100%  2025-05-07T19:45:03.0668753Z openjdk-23.0.2 | 181.4 MB | ###7 | 38% 2025-05-07T19:45:03.0669569Z 2025-05-07T19:45:03.0669583Z 2025-05-07T19:45:03.0669594Z 2025-05-07T19:45:03.0669604Z 2025-05-07T19:45:03.0669614Z 2025-05-07T19:45:03.0669653Z 2025-05-07T19:45:03.0669665Z 2025-05-07T19:45:03.0669675Z 2025-05-07T19:45:03.0669685Z 2025-05-07T19:45:03.0669696Z 2025-05-07T19:45:03.0669721Z 2025-05-07T19:45:03.0669731Z 2025-05-07T19:45:03.0669742Z 2025-05-07T19:45:03.0669752Z 2025-05-07T19:45:03.0669762Z 2025-05-07T19:45:03.0669772Z 2025-05-07T19:45:03.0669782Z 2025-05-07T19:45:03.0669792Z 2025-05-07T19:45:03.0822626Z ncurses-6.5 | 871 KB | ########## | 100%  2025-05-07T19:45:03.0822975Z 2025-05-07T19:45:03.0822980Z 2025-05-07T19:45:03.0822984Z 2025-05-07T19:45:03.1057886Z icu-75.1 | 11.6 MB | ########## | 100%  2025-05-07T19:45:03.1058686Z 2025-05-07T19:45:03.1058700Z 2025-05-07T19:45:03.1058711Z 2025-05-07T19:45:03.1058722Z 2025-05-07T19:45:03.1058732Z 2025-05-07T19:45:03.1058743Z 2025-05-07T19:45:03.1058754Z 2025-05-07T19:45:03.1090384Z libcups-2.3.3 | 4.3 MB | ########## | 100%  2025-05-07T19:45:03.1090716Z 2025-05-07T19:45:03.1090721Z 2025-05-07T19:45:03.1090725Z 2025-05-07T19:45:03.1090728Z 2025-05-07T19:45:03.1090732Z 2025-05-07T19:45:03.1090741Z 2025-05-07T19:45:03.1090744Z 2025-05-07T19:45:03.1090748Z 2025-05-07T19:45:03.1090751Z 2025-05-07T19:45:03.1090755Z 2025-05-07T19:45:03.1090758Z 2025-05-07T19:45:03.1090761Z 2025-05-07T19:45:03.1090765Z 2025-05-07T19:45:03.1090768Z 2025-05-07T19:45:03.1090772Z 2025-05-07T19:45:03.1090775Z 2025-05-07T19:45:03.1090778Z 2025-05-07T19:45:03.1090782Z 2025-05-07T19:45:03.1090785Z 2025-05-07T19:45:03.1461772Z ... (more hidden) ... 2025-05-07T19:45:03.1462092Z 2025-05-07T19:45:03.1462097Z 2025-05-07T19:45:03.1462101Z 2025-05-07T19:45:03.1462105Z 2025-05-07T19:45:03.1462108Z 2025-05-07T19:45:03.1462112Z 2025-05-07T19:45:03.1462115Z 2025-05-07T19:45:03.1462119Z 2025-05-07T19:45:03.1462143Z 2025-05-07T19:45:03.1462147Z 2025-05-07T19:45:03.1462150Z 2025-05-07T19:45:03.1462154Z 2025-05-07T19:45:03.1462168Z 2025-05-07T19:45:03.1462173Z 2025-05-07T19:45:03.1462176Z 2025-05-07T19:45:03.1462179Z 2025-05-07T19:45:03.1462183Z 2025-05-07T19:45:03.1462192Z 2025-05-07T19:45:03.1462195Z 2025-05-07T19:45:03.1699857Z ... (more hidden) ... 2025-05-07T19:45:03.1700188Z 2025-05-07T19:45:03.1783415Z bazel-7.5.0 | 47.4 MB | ########## | 100%  2025-05-07T19:45:03.3198144Z openjdk-23.0.2 | 181.4 MB | ####1 | 41% 2025-05-07T19:45:03.3198425Z 2025-05-07T19:45:03.3198430Z 2025-05-07T19:45:03.3198434Z 2025-05-07T19:45:03.3198438Z 2025-05-07T19:45:03.3198442Z 2025-05-07T19:45:03.3198445Z 2025-05-07T19:45:03.3667606Z libopenblas-0.3.29 | 5.6 MB | ########## | 100%  2025-05-07T19:45:03.4282108Z openjdk-23.0.2 | 181.4 MB | ####4 | 45% 2025-05-07T19:45:03.4282451Z 2025-05-07T19:45:03.4282537Z 2025-05-07T19:45:03.4282542Z 2025-05-07T19:45:03.4282547Z 2025-05-07T19:45:03.4282797Z 2025-05-07T19:45:03.4283325Z 2025-05-07T19:45:03.4283349Z 2025-05-07T19:45:03.4298226Z 2025-05-07T19:45:03.4306513Z libglib-2.84.0 | 3.8 MB | ########## | 100%  2025-05-07T19:45:03.4307809Z 2025-05-07T19:45:03.4307815Z 2025-05-07T19:45:03.4307819Z 2025-05-07T19:45:03.4307822Z 2025-05-07T19:45:03.4307826Z 2025-05-07T19:45:03.4307830Z 2025-05-07T19:45:03.4307833Z 2025-05-07T19:45:03.4315235Z 2025-05-07T19:45:03.5155707Z libglib-2.84.0 | 3.8 MB | ########## | 100%  2025-05-07T19:45:03.6155699Z openjdk-23.0.2 | 181.4 MB | ####7 | 48% 2025-05-07T19:45:03.7155801Z openjdk-23.0.2 | 181.4 MB | ##### | 51% 2025-05-07T19:45:03.8158258Z openjdk-23.0.2 | 181.4 MB | #####4 | 54% 2025-05-07T19:45:03.8541366Z openjdk-23.0.2 | 181.4 MB | #####8 | 58% 2025-05-07T19:45:03.8541698Z 2025-05-07T19:45:03.8541831Z 2025-05-07T19:45:03.8541836Z 2025-05-07T19:45:03.8541864Z 2025-05-07T19:45:03.8542040Z 2025-05-07T19:45:03.8542059Z 2025-05-07T19:45:03.8542066Z 2025-05-07T19:45:03.8542071Z 2025-05-07T19:45:03.8542075Z 2025-05-07T19:45:03.8544902Z libprotobuf-5.29.3 | 3.2 MB | ########## | 100%  2025-05-07T19:45:03.8545275Z 2025-05-07T19:45:03.8545281Z 2025-05-07T19:45:03.8545285Z 2025-05-07T19:45:03.8545289Z 2025-05-07T19:45:03.8545292Z 2025-05-07T19:45:03.8545296Z 2025-05-07T19:45:03.8545300Z 2025-05-07T19:45:03.8545303Z 2025-05-07T19:45:03.8545316Z 2025-05-07T19:45:03.9160395Z libprotobuf-5.29.3 | 3.2 MB | ########## | 100%  2025-05-07T19:45:04.0161780Z openjdk-23.0.2 | 181.4 MB | ######1 | 62% 2025-05-07T19:45:04.0774893Z openjdk-23.0.2 | 181.4 MB | ######5 | 65% 2025-05-07T19:45:04.0775492Z 2025-05-07T19:45:04.0775665Z 2025-05-07T19:45:04.0775674Z 2025-05-07T19:45:04.0775680Z 2025-05-07T19:45:04.0775686Z 2025-05-07T19:45:04.0775691Z 2025-05-07T19:45:04.0775696Z 2025-05-07T19:45:04.0775700Z 2025-05-07T19:45:04.0775727Z 2025-05-07T19:45:04.0775731Z 2025-05-07T19:45:04.0775735Z 2025-05-07T19:45:04.0776143Z harfbuzz-11.0.0 | 1.6 MB | ########## | 100%  2025-05-07T19:45:04.0776489Z 2025-05-07T19:45:04.0776492Z 2025-05-07T19:45:04.0776496Z 2025-05-07T19:45:04.0776499Z 2025-05-07T19:45:04.0776503Z 2025-05-07T19:45:04.0776506Z 2025-05-07T19:45:04.0776509Z 2025-05-07T19:45:04.0776513Z 2025-05-07T19:45:04.0776521Z 2025-05-07T19:45:04.0776525Z 2025-05-07T19:45:04.0776528Z 2025-05-07T19:45:04.0860456Z harfbuzz-11.0.0 | 1.6 MB | ########## | 100%  2025-05-07T19:45:04.0860832Z 2025-05-07T19:45:04.0860837Z 2025-05-07T19:45:04.0860841Z 2025-05-07T19:45:04.0860844Z 2025-05-07T19:45:04.0860848Z 2025-05-07T19:45:04.0860851Z 2025-05-07T19:45:04.0860855Z 2025-05-07T19:45:04.0860859Z 2025-05-07T19:45:04.0860864Z 2025-05-07T19:45:04.0860868Z 2025-05-07T19:45:04.0861133Z tk-8.6.13 | 3.2 MB | ########## | 100%  2025-05-07T19:45:04.0861425Z 2025-05-07T19:45:04.0861429Z 2025-05-07T19:45:04.0861432Z 2025-05-07T19:45:04.0861443Z 2025-05-07T19:45:04.0861447Z 2025-05-07T19:45:04.0861450Z 2025-05-07T19:45:04.0861454Z 2025-05-07T19:45:04.0861462Z 2025-05-07T19:45:04.0861465Z 2025-05-07T19:45:04.0861469Z 2025-05-07T19:45:04.1163300Z tk-8.6.13 | 3.2 MB | ########## | 100%  2025-05-07T19:45:04.1722002Z openjdk-23.0.2 | 181.4 MB | ######9 | 69% 2025-05-07T19:45:04.1722336Z 2025-05-07T19:45:04.1722341Z 2025-05-07T19:45:04.1722345Z 2025-05-07T19:45:04.1722348Z 2025-05-07T19:45:04.1722352Z 2025-05-07T19:45:04.1722356Z 2025-05-07T19:45:04.1722361Z 2025-05-07T19:45:04.1722364Z 2025-05-07T19:45:04.1722368Z 2025-05-07T19:45:04.1722371Z 2025-05-07T19:45:04.1722375Z 2025-05-07T19:45:04.1722378Z 2025-05-07T19:45:04.1722755Z font-ttf-ubuntu-0.83 | 1.5 MB | ########## | 100%  2025-05-07T19:45:04.1723344Z 2025-05-07T19:45:04.1723350Z 2025-05-07T19:45:04.1723353Z 2025-05-07T19:45:04.1723357Z 2025-05-07T19:45:04.1723361Z 2025-05-07T19:45:04.1723481Z 2025-05-07T19:45:04.1723484Z 2025-05-07T19:45:04.1723488Z 2025-05-07T19:45:04.1723491Z 2025-05-07T19:45:04.1723495Z 2025-05-07T19:45:04.1723498Z 2025-05-07T19:45:04.1723507Z 2025-05-07T19:45:04.2165315Z font-ttf-ubuntu-0.83 | 1.5 MB | ########## | 100%  2025-05-07T19:45:04.2393542Z openjdk-23.0.2 | 181.4 MB | #######3 | 73% 2025-05-07T19:45:04.2393844Z 2025-05-07T19:45:04.2393849Z 2025-05-07T19:45:04.2393879Z 2025-05-07T19:45:04.2393884Z 2025-05-07T19:45:04.2393889Z 2025-05-07T19:45:04.2393893Z 2025-05-07T19:45:04.2393897Z 2025-05-07T19:45:04.2393901Z 2025-05-07T19:45:04.2393905Z 2025-05-07T19:45:04.2393908Z 2025-05-07T19:45:04.2393912Z 2025-05-07T19:45:04.2393916Z 2025-05-07T19:45:04.2394316Z 2025-05-07T19:45:04.2396751Z libgfortran5-15.1.0 | 1.5 MB | ########## | 100%  2025-05-07T19:45:04.2397114Z 2025-05-07T19:45:04.2397128Z 2025-05-07T19:45:04.2397132Z 2025-05-07T19:45:04.2397145Z 2025-05-07T19:45:04.2397149Z 2025-05-07T19:45:04.2397152Z 2025-05-07T19:45:04.2397156Z 2025-05-07T19:45:04.2397159Z 2025-05-07T19:45:04.2397162Z 2025-05-07T19:45:04.2397166Z 2025-05-07T19:45:04.2397169Z 2025-05-07T19:45:04.2397173Z 2025-05-07T19:45:04.2397176Z 2025-05-07T19:45:04.3171610Z libgfortran5-15.1.0 | 1.5 MB | ########## | 100%  2025-05-07T19:45:04.4421815Z openjdk-23.0.2 | 181.4 MB | #######7 | 77% 2025-05-07T19:45:04.4422243Z 2025-05-07T19:45:04.4422295Z 2025-05-07T19:45:04.4422303Z 2025-05-07T19:45:04.4422309Z 2025-05-07T19:45:04.4422313Z 2025-05-07T19:45:04.4422318Z 2025-05-07T19:45:04.4422324Z 2025-05-07T19:45:04.4422331Z 2025-05-07T19:45:04.4422375Z 2025-05-07T19:45:04.4422381Z 2025-05-07T19:45:04.4422385Z 2025-05-07T19:45:04.4422390Z 2025-05-07T19:45:04.4422400Z 2025-05-07T19:45:04.4422440Z 2025-05-07T19:45:04.4422960Z krb5-1.21.3 | 1.3 MB | ########## | 100%  2025-05-07T19:45:04.4423386Z 2025-05-07T19:45:04.4423390Z 2025-05-07T19:45:04.4423417Z 2025-05-07T19:45:04.4423420Z 2025-05-07T19:45:04.4423424Z 2025-05-07T19:45:04.4423427Z 2025-05-07T19:45:04.4423431Z 2025-05-07T19:45:04.4423434Z 2025-05-07T19:45:04.4423438Z 2025-05-07T19:45:04.4423441Z 2025-05-07T19:45:04.4423445Z 2025-05-07T19:45:04.4423448Z 2025-05-07T19:45:04.4423451Z 2025-05-07T19:45:04.4423455Z 2025-05-07T19:45:04.4708969Z krb5-1.21.3 | 1.3 MB | ########## | 100%  2025-05-07T19:45:04.5242832Z openjdk-23.0.2 | 181.4 MB | ########1 | 81% 2025-05-07T19:45:04.5243322Z 2025-05-07T19:45:04.5243368Z 2025-05-07T19:45:04.5243376Z 2025-05-07T19:45:04.5243382Z 2025-05-07T19:45:04.5243386Z 2025-05-07T19:45:04.5243413Z 2025-05-07T19:45:04.5243417Z 2025-05-07T19:45:04.5243422Z 2025-05-07T19:45:04.5243445Z 2025-05-07T19:45:04.5243449Z 2025-05-07T19:45:04.5243453Z 2025-05-07T19:45:04.5243457Z 2025-05-07T19:45:04.5243479Z 2025-05-07T19:45:04.5243483Z 2025-05-07T19:45:04.5243486Z 2025-05-07T19:45:04.5243490Z 2025-05-07T19:45:04.5243880Z cairo-1.18.4 | 955 KB | ########## | 100%  2025-05-07T19:45:04.5244287Z 2025-05-07T19:45:04.5244301Z 2025-05-07T19:45:04.5244304Z 2025-05-07T19:45:04.5244308Z 2025-05-07T19:45:04.5244311Z 2025-05-07T19:45:04.5244314Z 2025-05-07T19:45:04.5244318Z 2025-05-07T19:45:04.5244321Z 2025-05-07T19:45:04.5244325Z 2025-05-07T19:45:04.5244328Z 2025-05-07T19:45:04.5244332Z 2025-05-07T19:45:04.5244335Z 2025-05-07T19:45:04.5244339Z 2025-05-07T19:45:04.5244342Z 2025-05-07T19:45:04.5244345Z 2025-05-07T19:45:04.5244349Z 2025-05-07T19:45:04.5735556Z cairo-1.18.4 | 955 KB | ########## | 100%  2025-05-07T19:45:04.7106923Z openjdk-23.0.2 | 181.4 MB | ########4 | 85% 2025-05-07T19:45:04.7107252Z 2025-05-07T19:45:04.7107257Z 2025-05-07T19:45:04.7107264Z 2025-05-07T19:45:04.7107428Z 2025-05-07T19:45:04.7107432Z 2025-05-07T19:45:04.7107436Z 2025-05-07T19:45:04.7107439Z 2025-05-07T19:45:04.7107443Z 2025-05-07T19:45:04.7107446Z 2025-05-07T19:45:04.7107450Z 2025-05-07T19:45:04.7107453Z 2025-05-07T19:45:04.7107457Z 2025-05-07T19:45:04.7107460Z 2025-05-07T19:45:04.7107464Z 2025-05-07T19:45:04.7107467Z 2025-05-07T19:45:04.7109374Z libabseil-20250127.1 | 1.3 MB | ########## | 100%  2025-05-07T19:45:04.7109716Z 2025-05-07T19:45:04.7109719Z 2025-05-07T19:45:04.7109723Z 2025-05-07T19:45:04.7109727Z 2025-05-07T19:45:04.7109730Z 2025-05-07T19:45:04.7109734Z 2025-05-07T19:45:04.7109738Z 2025-05-07T19:45:04.7109741Z 2025-05-07T19:45:04.7109744Z 2025-05-07T19:45:04.7109748Z 2025-05-07T19:45:04.7109766Z 2025-05-07T19:45:04.7109769Z 2025-05-07T19:45:04.7109773Z 2025-05-07T19:45:04.7109784Z 2025-05-07T19:45:04.7109792Z 2025-05-07T19:45:04.7109796Z 2025-05-07T19:45:04.7111382Z 2025-05-07T19:45:04.7112451Z pcre2-10.44 | 934 KB | ########## | 100%  2025-05-07T19:45:04.7112777Z 2025-05-07T19:45:04.7112781Z 2025-05-07T19:45:04.7112784Z 2025-05-07T19:45:04.7112788Z 2025-05-07T19:45:04.7112791Z 2025-05-07T19:45:04.7112801Z 2025-05-07T19:45:04.7112805Z 2025-05-07T19:45:04.7112808Z 2025-05-07T19:45:04.7112812Z 2025-05-07T19:45:04.7112816Z 2025-05-07T19:45:04.7112819Z 2025-05-07T19:45:04.7112823Z 2025-05-07T19:45:04.7112826Z 2025-05-07T19:45:04.7112829Z 2025-05-07T19:45:04.7112833Z 2025-05-07T19:45:04.7116458Z libabseil-20250127.1 | 1.3 MB | ########## | 100%  2025-05-07T19:45:04.7116788Z 2025-05-07T19:45:04.7116792Z 2025-05-07T19:45:04.7116796Z 2025-05-07T19:45:04.7116799Z 2025-05-07T19:45:04.7116803Z 2025-05-07T19:45:04.7116814Z 2025-05-07T19:45:04.7116825Z 2025-05-07T19:45:04.7116829Z 2025-05-07T19:45:04.7116832Z 2025-05-07T19:45:04.7116836Z 2025-05-07T19:45:04.7116839Z 2025-05-07T19:45:04.7116847Z 2025-05-07T19:45:04.7116850Z 2025-05-07T19:45:04.7116854Z 2025-05-07T19:45:04.7116871Z 2025-05-07T19:45:04.7116874Z 2025-05-07T19:45:04.7116878Z 2025-05-07T19:45:04.7752330Z pcre2-10.44 | 934 KB | ########## | 100%  2025-05-07T19:45:04.9269113Z openjdk-23.0.2 | 181.4 MB | ########7 | 88% 2025-05-07T19:45:05.0208852Z openjdk-23.0.2 | 181.4 MB | ######### | 91% 2025-05-07T19:45:05.0209171Z 2025-05-07T19:45:05.0209177Z 2025-05-07T19:45:05.0209180Z 2025-05-07T19:45:05.0209184Z 2025-05-07T19:45:05.0209189Z 2025-05-07T19:45:05.0209192Z 2025-05-07T19:45:05.0209196Z 2025-05-07T19:45:05.0209199Z 2025-05-07T19:45:05.0209226Z 2025-05-07T19:45:05.0209231Z 2025-05-07T19:45:05.0209234Z 2025-05-07T19:45:05.0209238Z 2025-05-07T19:45:05.0209241Z 2025-05-07T19:45:05.0209244Z 2025-05-07T19:45:05.0209284Z 2025-05-07T19:45:05.0209287Z 2025-05-07T19:45:05.0209291Z 2025-05-07T19:45:05.0209294Z 2025-05-07T19:45:05.0209331Z 2025-05-07T19:45:05.0212632Z ... (more hidden) ... 2025-05-07T19:45:05.0212972Z 2025-05-07T19:45:05.0212976Z 2025-05-07T19:45:05.0212980Z 2025-05-07T19:45:05.0212983Z 2025-05-07T19:45:05.0212987Z 2025-05-07T19:45:05.0213004Z 2025-05-07T19:45:05.0213007Z 2025-05-07T19:45:05.0213011Z 2025-05-07T19:45:05.0213015Z 2025-05-07T19:45:05.0213019Z 2025-05-07T19:45:05.0213022Z 2025-05-07T19:45:05.0213025Z 2025-05-07T19:45:05.0213029Z 2025-05-07T19:45:05.0213032Z 2025-05-07T19:45:05.0213035Z 2025-05-07T19:45:05.0213039Z 2025-05-07T19:45:05.0213042Z 2025-05-07T19:45:05.0213046Z 2025-05-07T19:45:05.0213049Z 2025-05-07T19:45:05.0268733Z ... (more hidden) ... 2025-05-07T19:45:05.1270325Z openjdk-23.0.2 | 181.4 MB | #########4 | 94% 2025-05-07T19:45:06.5018309Z openjdk-23.0.2 | 181.4 MB | #########7 | 98% 2025-05-07T19:45:06.5018772Z 2025-05-07T19:45:06.5018779Z 2025-05-07T19:45:06.5018965Z 2025-05-07T19:45:06.5018968Z 2025-05-07T19:45:06.5018972Z 2025-05-07T19:45:06.5018975Z 2025-05-07T19:45:06.5018979Z 2025-05-07T19:45:06.5018982Z 2025-05-07T19:45:06.5018987Z 2025-05-07T19:45:06.5018992Z 2025-05-07T19:45:06.5018995Z 2025-05-07T19:45:06.5018998Z 2025-05-07T19:45:06.5019002Z 2025-05-07T19:45:06.5019005Z 2025-05-07T19:45:06.5019031Z 2025-05-07T19:45:06.5019034Z 2025-05-07T19:45:06.5019038Z 2025-05-07T19:45:06.5019041Z 2025-05-07T19:45:06.5019616Z ncurses-6.5 | 871 KB | ########## | 100%  2025-05-07T19:45:06.5019948Z 2025-05-07T19:45:06.5019952Z 2025-05-07T19:45:06.5019955Z 2025-05-07T19:45:06.5019959Z 2025-05-07T19:45:06.5019962Z 2025-05-07T19:45:06.5019966Z 2025-05-07T19:45:06.5019989Z 2025-05-07T19:45:06.5019992Z 2025-05-07T19:45:06.5019996Z 2025-05-07T19:45:06.5020004Z 2025-05-07T19:45:06.5020008Z 2025-05-07T19:45:06.5020012Z 2025-05-07T19:45:06.5020016Z 2025-05-07T19:45:06.5020023Z 2025-05-07T19:45:06.5020026Z 2025-05-07T19:45:06.5020030Z 2025-05-07T19:45:06.5020033Z 2025-05-07T19:45:06.5020037Z 2025-05-07T19:45:06.6434470Z ncurses-6.5 | 871 KB | ########## | 100%  2025-05-07T19:45:06.6434834Z 2025-05-07T19:45:06.7912570Z bazel-7.5.0 | 47.4 MB | ########## | 100%  2025-05-07T19:45:06.7912832Z 2025-05-07T19:45:06.7912845Z 2025-05-07T19:45:07.4689640Z cmake-4.0.2 | 19.4 MB | ########## | 100%  2025-05-07T19:45:08.6178287Z openjdk-23.0.2 | 181.4 MB | ########## | 100% 2025-05-07T19:45:08.6186101Z openjdk-23.0.2 | 181.4 MB | ########## | 100% 2025-05-07T19:45:08.6186412Z 2025-05-07T19:45:08.6186427Z 2025-05-07T19:45:08.6186462Z 2025-05-07T19:45:08.6186467Z 2025-05-07T19:45:08.6186489Z 2025-05-07T19:45:08.6186498Z 2025-05-07T19:45:08.6186555Z 2025-05-07T19:45:08.6186560Z 2025-05-07T19:45:08.6186707Z 2025-05-07T19:45:08.6186724Z 2025-05-07T19:45:08.6186733Z 2025-05-07T19:45:08.6186787Z 2025-05-07T19:45:08.6186794Z 2025-05-07T19:45:08.6186800Z 2025-05-07T19:45:08.6186805Z 2025-05-07T19:45:08.6186809Z 2025-05-07T19:45:08.6186814Z 2025-05-07T19:45:08.6186818Z 2025-05-07T19:45:08.6186825Z 2025-05-07T19:45:08.6187360Z 2025-05-07T19:45:08.6187887Z  2025-05-07T19:45:08.6188277Z 2025-05-07T19:45:08.6188495Z 2025-05-07T19:45:08.6188692Z  2025-05-07T19:45:08.6188909Z 2025-05-07T19:45:08.6188913Z 2025-05-07T19:45:08.6189089Z  2025-05-07T19:45:08.6189325Z 2025-05-07T19:45:08.6189331Z 2025-05-07T19:45:08.6189337Z 2025-05-07T19:45:08.6189539Z  2025-05-07T19:45:08.6189760Z 2025-05-07T19:45:08.6189763Z 2025-05-07T19:45:08.6189766Z 2025-05-07T19:45:08.6189789Z 2025-05-07T19:45:08.6189991Z  2025-05-07T19:45:08.6190216Z 2025-05-07T19:45:08.6190220Z 2025-05-07T19:45:08.6190224Z 2025-05-07T19:45:08.6190227Z 2025-05-07T19:45:08.6190231Z 2025-05-07T19:45:08.6190414Z  2025-05-07T19:45:08.6190735Z 2025-05-07T19:45:08.6190739Z 2025-05-07T19:45:08.6190743Z 2025-05-07T19:45:08.6190746Z 2025-05-07T19:45:08.6190749Z 2025-05-07T19:45:08.6190769Z 2025-05-07T19:45:08.6190984Z  2025-05-07T19:45:08.6191219Z 2025-05-07T19:45:08.6191222Z 2025-05-07T19:45:08.6191226Z 2025-05-07T19:45:08.6191229Z 2025-05-07T19:45:08.6191232Z 2025-05-07T19:45:08.6191236Z 2025-05-07T19:45:08.6191239Z 2025-05-07T19:45:08.6191804Z  2025-05-07T19:45:08.6192043Z 2025-05-07T19:45:08.6192047Z 2025-05-07T19:45:08.6192186Z 2025-05-07T19:45:08.6192189Z 2025-05-07T19:45:08.6192193Z 2025-05-07T19:45:08.6192196Z 2025-05-07T19:45:08.6192200Z 2025-05-07T19:45:08.6192203Z 2025-05-07T19:45:08.6192429Z  2025-05-07T19:45:08.6192663Z 2025-05-07T19:45:08.6192667Z 2025-05-07T19:45:08.6192670Z 2025-05-07T19:45:08.6192674Z 2025-05-07T19:45:08.6192677Z 2025-05-07T19:45:08.6192681Z 2025-05-07T19:45:08.6192684Z 2025-05-07T19:45:08.6192688Z 2025-05-07T19:45:08.6192692Z 2025-05-07T19:45:08.6192912Z  2025-05-07T19:45:08.6193151Z 2025-05-07T19:45:08.6193154Z 2025-05-07T19:45:08.6193158Z 2025-05-07T19:45:08.6193162Z 2025-05-07T19:45:08.6193165Z 2025-05-07T19:45:08.6193169Z 2025-05-07T19:45:08.6193172Z 2025-05-07T19:45:08.6193182Z 2025-05-07T19:45:08.6193186Z 2025-05-07T19:45:08.6193189Z 2025-05-07T19:45:08.6193410Z  2025-05-07T19:45:08.6193659Z 2025-05-07T19:45:08.6193662Z 2025-05-07T19:45:08.6193667Z 2025-05-07T19:45:08.6193671Z 2025-05-07T19:45:08.6193674Z 2025-05-07T19:45:08.6193678Z 2025-05-07T19:45:08.6193681Z 2025-05-07T19:45:08.6193684Z 2025-05-07T19:45:08.6193688Z 2025-05-07T19:45:08.6193691Z 2025-05-07T19:45:08.6193694Z 2025-05-07T19:45:08.6193917Z  2025-05-07T19:45:08.6194157Z 2025-05-07T19:45:08.6194161Z 2025-05-07T19:45:08.6194164Z 2025-05-07T19:45:08.6194168Z 2025-05-07T19:45:08.6194171Z 2025-05-07T19:45:08.6194175Z 2025-05-07T19:45:08.6194178Z 2025-05-07T19:45:08.6194182Z 2025-05-07T19:45:08.6194185Z 2025-05-07T19:45:08.6194188Z 2025-05-07T19:45:08.6194192Z 2025-05-07T19:45:08.6194195Z 2025-05-07T19:45:08.6194431Z  2025-05-07T19:45:08.6194675Z 2025-05-07T19:45:08.6194678Z 2025-05-07T19:45:08.6194685Z 2025-05-07T19:45:08.6194689Z 2025-05-07T19:45:08.6194692Z 2025-05-07T19:45:08.6194696Z 2025-05-07T19:45:08.6194699Z 2025-05-07T19:45:08.6194702Z 2025-05-07T19:45:08.6194706Z 2025-05-07T19:45:08.6194710Z 2025-05-07T19:45:08.6194714Z 2025-05-07T19:45:08.6194735Z 2025-05-07T19:45:08.6194738Z 2025-05-07T19:45:08.6194945Z  2025-05-07T19:45:08.6195192Z 2025-05-07T19:45:08.6195196Z 2025-05-07T19:45:08.6195199Z 2025-05-07T19:45:08.6195202Z 2025-05-07T19:45:08.6195206Z 2025-05-07T19:45:08.6195209Z 2025-05-07T19:45:08.6195212Z 2025-05-07T19:45:08.6195217Z 2025-05-07T19:45:08.6195237Z 2025-05-07T19:45:08.6195240Z 2025-05-07T19:45:08.6195243Z 2025-05-07T19:45:08.6195247Z 2025-05-07T19:45:08.6195250Z 2025-05-07T19:45:08.6195254Z 2025-05-07T19:45:08.6195470Z  2025-05-07T19:45:08.6195716Z 2025-05-07T19:45:08.6195724Z 2025-05-07T19:45:08.6195728Z 2025-05-07T19:45:08.6195732Z 2025-05-07T19:45:08.6195753Z 2025-05-07T19:45:08.6195756Z 2025-05-07T19:45:08.6195760Z 2025-05-07T19:45:08.6195763Z 2025-05-07T19:45:08.6195767Z 2025-05-07T19:45:08.6195770Z 2025-05-07T19:45:08.6195773Z 2025-05-07T19:45:08.6195777Z 2025-05-07T19:45:08.6195780Z 2025-05-07T19:45:08.6195784Z 2025-05-07T19:45:08.6195787Z 2025-05-07T19:45:08.6196007Z  2025-05-07T19:45:08.6196272Z 2025-05-07T19:45:08.6196276Z 2025-05-07T19:45:08.6196279Z 2025-05-07T19:45:08.6196282Z 2025-05-07T19:45:08.6196286Z 2025-05-07T19:45:08.6196289Z 2025-05-07T19:45:08.6196293Z 2025-05-07T19:45:08.6196296Z 2025-05-07T19:45:08.6196299Z 2025-05-07T19:45:08.6196303Z 2025-05-07T19:45:08.6196306Z 2025-05-07T19:45:08.6196419Z 2025-05-07T19:45:08.6196423Z 2025-05-07T19:45:08.6196426Z 2025-05-07T19:45:08.6196430Z 2025-05-07T19:45:08.6196493Z 2025-05-07T19:45:08.6196743Z  2025-05-07T19:45:08.6196995Z 2025-05-07T19:45:08.6196999Z 2025-05-07T19:45:08.6197003Z 2025-05-07T19:45:08.6197007Z 2025-05-07T19:45:08.6197010Z 2025-05-07T19:45:08.6197014Z 2025-05-07T19:45:08.6197017Z 2025-05-07T19:45:08.6197020Z 2025-05-07T19:45:08.6197024Z 2025-05-07T19:45:08.6197027Z 2025-05-07T19:45:08.6197031Z 2025-05-07T19:45:08.6197034Z 2025-05-07T19:45:08.6197038Z 2025-05-07T19:45:08.6197042Z 2025-05-07T19:45:08.6197045Z 2025-05-07T19:45:08.6197049Z 2025-05-07T19:45:08.6197069Z 2025-05-07T19:45:08.6197307Z  2025-05-07T19:45:08.6197560Z 2025-05-07T19:45:08.6197563Z 2025-05-07T19:45:08.6197567Z 2025-05-07T19:45:08.6197574Z 2025-05-07T19:45:08.6197577Z 2025-05-07T19:45:08.6197581Z 2025-05-07T19:45:08.6197584Z 2025-05-07T19:45:08.6197588Z 2025-05-07T19:45:08.6197595Z 2025-05-07T19:45:08.6197616Z 2025-05-07T19:45:08.6197619Z 2025-05-07T19:45:08.6197623Z 2025-05-07T19:45:08.6197626Z 2025-05-07T19:45:08.6197629Z 2025-05-07T19:45:08.6197633Z 2025-05-07T19:45:08.6197636Z 2025-05-07T19:45:08.6197640Z 2025-05-07T19:45:08.6197643Z 2025-05-07T19:45:08.6197876Z  2025-05-07T19:45:08.6198151Z 2025-05-07T19:45:08.6198154Z 2025-05-07T19:45:08.6198256Z  2025-05-07T19:45:08.6198366Z 2025-05-07T19:45:08.6198370Z 2025-05-07T19:45:08.6198573Z  2025-05-07T19:45:08.6198704Z 2025-05-07T19:45:08.6198709Z 2025-05-07T19:45:08.6198713Z 2025-05-07T19:45:08.6198820Z  2025-05-07T19:45:08.6198938Z 2025-05-07T19:45:08.6198942Z 2025-05-07T19:45:08.6198965Z 2025-05-07T19:45:08.6198968Z 2025-05-07T19:45:08.6199083Z  2025-05-07T19:45:08.6199205Z 2025-05-07T19:45:08.6199209Z 2025-05-07T19:45:08.6199212Z 2025-05-07T19:45:08.6199220Z 2025-05-07T19:45:08.6199223Z 2025-05-07T19:45:08.6199354Z  2025-05-07T19:45:08.6199491Z 2025-05-07T19:45:08.6199495Z 2025-05-07T19:45:08.6199499Z 2025-05-07T19:45:08.6199502Z 2025-05-07T19:45:08.6199505Z 2025-05-07T19:45:08.6199509Z 2025-05-07T19:45:08.6199624Z  2025-05-07T19:45:08.6199774Z 2025-05-07T19:45:08.6199777Z 2025-05-07T19:45:08.6199781Z 2025-05-07T19:45:08.6199784Z 2025-05-07T19:45:08.6199788Z 2025-05-07T19:45:08.6199791Z 2025-05-07T19:45:08.6199795Z 2025-05-07T19:45:08.6199914Z  2025-05-07T19:45:08.6200084Z 2025-05-07T19:45:08.6200087Z 2025-05-07T19:45:08.6200091Z 2025-05-07T19:45:08.6200094Z 2025-05-07T19:45:08.6200098Z 2025-05-07T19:45:08.6200101Z 2025-05-07T19:45:08.6200105Z 2025-05-07T19:45:08.6200109Z 2025-05-07T19:45:08.6200234Z  2025-05-07T19:45:08.6200396Z 2025-05-07T19:45:08.6200400Z 2025-05-07T19:45:08.6200405Z 2025-05-07T19:45:08.6200424Z 2025-05-07T19:45:08.6200428Z 2025-05-07T19:45:08.6200435Z 2025-05-07T19:45:08.6200439Z 2025-05-07T19:45:08.6200442Z 2025-05-07T19:45:08.6200445Z 2025-05-07T19:45:08.6200570Z  2025-05-07T19:45:08.6200732Z 2025-05-07T19:45:08.6200736Z 2025-05-07T19:45:08.6200739Z 2025-05-07T19:45:08.6200743Z 2025-05-07T19:45:08.6200746Z 2025-05-07T19:45:08.6200765Z 2025-05-07T19:45:08.6200769Z 2025-05-07T19:45:08.6200772Z 2025-05-07T19:45:08.6200776Z 2025-05-07T19:45:08.6200779Z 2025-05-07T19:45:08.6200910Z  2025-05-07T19:45:08.6201082Z 2025-05-07T19:45:08.6201085Z 2025-05-07T19:45:08.6201089Z 2025-05-07T19:45:08.6201092Z 2025-05-07T19:45:08.6201096Z 2025-05-07T19:45:08.6201099Z 2025-05-07T19:45:08.6201119Z 2025-05-07T19:45:08.6201122Z 2025-05-07T19:45:08.6201125Z 2025-05-07T19:45:08.6201129Z 2025-05-07T19:45:08.6201132Z 2025-05-07T19:45:08.6201344Z  2025-05-07T19:45:08.6201529Z 2025-05-07T19:45:08.6201533Z 2025-05-07T19:45:08.6201536Z 2025-05-07T19:45:08.6201599Z 2025-05-07T19:45:08.6201603Z 2025-05-07T19:45:08.6201606Z 2025-05-07T19:45:08.6201626Z 2025-05-07T19:45:08.6201630Z 2025-05-07T19:45:08.6201633Z 2025-05-07T19:45:08.6201636Z 2025-05-07T19:45:08.6201640Z 2025-05-07T19:45:08.6201643Z 2025-05-07T19:45:08.6201784Z  2025-05-07T19:45:08.6201975Z 2025-05-07T19:45:08.6201979Z 2025-05-07T19:45:08.6201982Z 2025-05-07T19:45:08.6201985Z 2025-05-07T19:45:08.6201989Z 2025-05-07T19:45:08.6202036Z 2025-05-07T19:45:08.6202039Z 2025-05-07T19:45:08.6202043Z 2025-05-07T19:45:08.6202046Z 2025-05-07T19:45:08.6202049Z 2025-05-07T19:45:08.6202053Z 2025-05-07T19:45:08.6202056Z 2025-05-07T19:45:08.6202059Z 2025-05-07T19:45:08.6202200Z  2025-05-07T19:45:08.6202397Z 2025-05-07T19:45:08.6202401Z 2025-05-07T19:45:08.6202423Z 2025-05-07T19:45:08.6202431Z 2025-05-07T19:45:08.6202434Z 2025-05-07T19:45:08.6202438Z 2025-05-07T19:45:08.6202441Z 2025-05-07T19:45:08.6202445Z 2025-05-07T19:45:08.6202452Z 2025-05-07T19:45:08.6202455Z 2025-05-07T19:45:08.6202459Z 2025-05-07T19:45:08.6202462Z 2025-05-07T19:45:08.6202466Z 2025-05-07T19:45:08.6202469Z 2025-05-07T19:45:08.6202618Z  2025-05-07T19:45:08.6202837Z 2025-05-07T19:45:08.6202841Z 2025-05-07T19:45:08.6202844Z 2025-05-07T19:45:08.6202847Z 2025-05-07T19:45:08.6202851Z 2025-05-07T19:45:08.6202854Z 2025-05-07T19:45:08.6202858Z 2025-05-07T19:45:08.6202861Z 2025-05-07T19:45:08.6202865Z 2025-05-07T19:45:08.6202868Z 2025-05-07T19:45:08.6202872Z 2025-05-07T19:45:08.6202875Z 2025-05-07T19:45:08.6202878Z 2025-05-07T19:45:08.6202882Z 2025-05-07T19:45:08.6202885Z 2025-05-07T19:45:08.6203035Z  2025-05-07T19:45:08.6203260Z 2025-05-07T19:45:08.6203263Z 2025-05-07T19:45:08.6203266Z 2025-05-07T19:45:08.6203273Z 2025-05-07T19:45:08.6203277Z 2025-05-07T19:45:08.6203280Z 2025-05-07T19:45:08.6203283Z 2025-05-07T19:45:08.6203287Z 2025-05-07T19:45:08.6203293Z 2025-05-07T19:45:08.6203297Z 2025-05-07T19:45:08.6203300Z 2025-05-07T19:45:08.6203303Z 2025-05-07T19:45:08.6203307Z 2025-05-07T19:45:08.6203310Z 2025-05-07T19:45:08.6203314Z 2025-05-07T19:45:08.6203317Z 2025-05-07T19:45:08.6203496Z  2025-05-07T19:45:08.6203711Z 2025-05-07T19:45:08.6203715Z 2025-05-07T19:45:08.6203719Z 2025-05-07T19:45:08.6203722Z 2025-05-07T19:45:08.6203725Z 2025-05-07T19:45:08.6203729Z 2025-05-07T19:45:08.6203732Z 2025-05-07T19:45:08.6203735Z 2025-05-07T19:45:08.6203739Z 2025-05-07T19:45:08.6203742Z 2025-05-07T19:45:08.6203745Z 2025-05-07T19:45:08.6203749Z 2025-05-07T19:45:08.6203752Z 2025-05-07T19:45:08.6203772Z 2025-05-07T19:45:08.6203776Z 2025-05-07T19:45:08.6203779Z 2025-05-07T19:45:08.6203782Z 2025-05-07T19:45:08.6203960Z  2025-05-07T19:45:08.6204194Z 2025-05-07T19:45:08.6204197Z 2025-05-07T19:45:08.6204200Z 2025-05-07T19:45:08.6204204Z 2025-05-07T19:45:08.6204210Z 2025-05-07T19:45:08.6204214Z 2025-05-07T19:45:08.6204234Z 2025-05-07T19:45:08.6204238Z 2025-05-07T19:45:08.6204241Z 2025-05-07T19:45:08.6204245Z 2025-05-07T19:45:08.6204248Z 2025-05-07T19:45:08.6204252Z 2025-05-07T19:45:08.6204255Z 2025-05-07T19:45:08.6204259Z 2025-05-07T19:45:08.6204262Z 2025-05-07T19:45:08.6204266Z 2025-05-07T19:45:08.6204269Z 2025-05-07T19:45:08.6204272Z 2025-05-07T19:45:08.6204445Z  2025-05-07T19:45:08.6204688Z 2025-05-07T19:45:08.6204692Z 2025-05-07T19:45:08.6204791Z  2025-05-07T19:45:08.6204900Z 2025-05-07T19:45:08.6204903Z 2025-05-07T19:45:08.6205029Z  2025-05-07T19:45:08.6205141Z 2025-05-07T19:45:08.6205144Z 2025-05-07T19:45:08.6205147Z 2025-05-07T19:45:08.6205255Z  2025-05-07T19:45:08.6205370Z 2025-05-07T19:45:08.6205392Z 2025-05-07T19:45:08.6205455Z 2025-05-07T19:45:08.6205460Z 2025-05-07T19:45:08.6205570Z  2025-05-07T19:45:08.6205695Z 2025-05-07T19:45:08.6205756Z 2025-05-07T19:45:08.6205759Z 2025-05-07T19:45:08.6205762Z 2025-05-07T19:45:08.6205766Z 2025-05-07T19:45:08.6205905Z  2025-05-07T19:45:08.6206033Z 2025-05-07T19:45:08.6206037Z 2025-05-07T19:45:08.6206040Z 2025-05-07T19:45:08.6206043Z 2025-05-07T19:45:08.6206047Z 2025-05-07T19:45:08.6206050Z 2025-05-07T19:45:08.6206162Z  2025-05-07T19:45:08.6206313Z 2025-05-07T19:45:08.6206316Z 2025-05-07T19:45:08.6206320Z 2025-05-07T19:45:08.6206323Z 2025-05-07T19:45:08.6206327Z 2025-05-07T19:45:08.6206330Z 2025-05-07T19:45:08.6206334Z 2025-05-07T19:45:08.6206447Z  2025-05-07T19:45:08.6206609Z 2025-05-07T19:45:08.6206612Z 2025-05-07T19:45:08.6206616Z 2025-05-07T19:45:08.6206619Z 2025-05-07T19:45:08.6206623Z 2025-05-07T19:45:08.6206626Z 2025-05-07T19:45:08.6206629Z 2025-05-07T19:45:08.6206633Z 2025-05-07T19:45:08.6206756Z  2025-05-07T19:45:08.6206912Z 2025-05-07T19:45:08.6206916Z 2025-05-07T19:45:08.6206919Z 2025-05-07T19:45:08.6206947Z 2025-05-07T19:45:08.6206950Z 2025-05-07T19:45:08.6206954Z 2025-05-07T19:45:08.6206958Z 2025-05-07T19:45:08.6206961Z 2025-05-07T19:45:08.6206964Z 2025-05-07T19:45:08.6207087Z  2025-05-07T19:45:08.6207258Z 2025-05-07T19:45:08.6207261Z 2025-05-07T19:45:08.6207264Z 2025-05-07T19:45:08.6207268Z 2025-05-07T19:45:08.6207271Z 2025-05-07T19:45:08.6207292Z 2025-05-07T19:45:08.6207296Z 2025-05-07T19:45:08.6207299Z 2025-05-07T19:45:08.6207303Z 2025-05-07T19:45:08.6207306Z 2025-05-07T19:45:08.6207436Z  2025-05-07T19:45:08.6207608Z 2025-05-07T19:45:08.6207612Z 2025-05-07T19:45:08.6207615Z 2025-05-07T19:45:08.6207619Z 2025-05-07T19:45:08.6207622Z 2025-05-07T19:45:08.6207625Z 2025-05-07T19:45:08.6207645Z 2025-05-07T19:45:08.6207648Z 2025-05-07T19:45:08.6207652Z 2025-05-07T19:45:08.6207658Z 2025-05-07T19:45:08.6207662Z 2025-05-07T19:45:08.6207921Z  2025-05-07T19:45:08.6208104Z 2025-05-07T19:45:08.6208111Z 2025-05-07T19:45:08.6208114Z 2025-05-07T19:45:08.6208118Z 2025-05-07T19:45:08.6208138Z 2025-05-07T19:45:08.6208141Z 2025-05-07T19:45:08.6208145Z 2025-05-07T19:45:08.6208148Z 2025-05-07T19:45:08.6208151Z 2025-05-07T19:45:08.6208155Z 2025-05-07T19:45:08.6208158Z 2025-05-07T19:45:08.6208162Z 2025-05-07T19:45:08.6208300Z  2025-05-07T19:45:08.6208489Z 2025-05-07T19:45:08.6208492Z 2025-05-07T19:45:08.6208495Z 2025-05-07T19:45:08.6208517Z 2025-05-07T19:45:08.6208520Z 2025-05-07T19:45:08.6208523Z 2025-05-07T19:45:08.6208527Z 2025-05-07T19:45:08.6208530Z 2025-05-07T19:45:08.6208533Z 2025-05-07T19:45:08.6208537Z 2025-05-07T19:45:08.6208540Z 2025-05-07T19:45:08.6208543Z 2025-05-07T19:45:08.6208547Z 2025-05-07T19:45:08.6208941Z  2025-05-07T19:45:08.6209148Z 2025-05-07T19:45:08.6209171Z 2025-05-07T19:45:08.6209174Z 2025-05-07T19:45:08.6209178Z 2025-05-07T19:45:08.6209181Z 2025-05-07T19:45:08.6209188Z 2025-05-07T19:45:08.6209191Z 2025-05-07T19:45:08.6209195Z 2025-05-07T19:45:08.6209198Z 2025-05-07T19:45:08.6209202Z 2025-05-07T19:45:08.6209205Z 2025-05-07T19:45:08.6209209Z 2025-05-07T19:45:08.6209212Z 2025-05-07T19:45:08.6209216Z 2025-05-07T19:45:08.6209369Z  2025-05-07T19:45:08.6209607Z 2025-05-07T19:45:08.6209610Z 2025-05-07T19:45:08.6209613Z 2025-05-07T19:45:08.6209617Z 2025-05-07T19:45:08.6209620Z 2025-05-07T19:45:08.6209623Z 2025-05-07T19:45:08.6209627Z 2025-05-07T19:45:08.6209630Z 2025-05-07T19:45:08.6209634Z 2025-05-07T19:45:08.6209638Z 2025-05-07T19:45:08.6209641Z 2025-05-07T19:45:08.6209644Z 2025-05-07T19:45:08.6209648Z 2025-05-07T19:45:08.6209651Z 2025-05-07T19:45:08.6209654Z 2025-05-07T19:45:08.6209828Z  2025-05-07T19:45:08.6210037Z 2025-05-07T19:45:08.6210156Z 2025-05-07T19:45:08.6210160Z 2025-05-07T19:45:08.6210164Z 2025-05-07T19:45:08.6210168Z 2025-05-07T19:45:08.6210171Z 2025-05-07T19:45:08.6210267Z 2025-05-07T19:45:08.6210271Z 2025-05-07T19:45:08.6210274Z 2025-05-07T19:45:08.6210277Z 2025-05-07T19:45:08.6210281Z 2025-05-07T19:45:08.6210284Z 2025-05-07T19:45:08.6210288Z 2025-05-07T19:45:08.6210291Z 2025-05-07T19:45:08.6210294Z 2025-05-07T19:45:08.6210298Z 2025-05-07T19:45:08.6210480Z  2025-05-07T19:45:08.6210699Z 2025-05-07T19:45:08.6210702Z 2025-05-07T19:45:08.6210706Z 2025-05-07T19:45:08.6210709Z 2025-05-07T19:45:08.6210713Z 2025-05-07T19:45:08.6210716Z 2025-05-07T19:45:08.6210719Z 2025-05-07T19:45:08.6210723Z 2025-05-07T19:45:08.6210726Z 2025-05-07T19:45:08.6210730Z 2025-05-07T19:45:08.6210733Z 2025-05-07T19:45:08.6210756Z 2025-05-07T19:45:08.6210759Z 2025-05-07T19:45:08.6210763Z 2025-05-07T19:45:08.6210766Z 2025-05-07T19:45:08.6210769Z 2025-05-07T19:45:08.6210777Z 2025-05-07T19:45:08.6210939Z  2025-05-07T19:45:08.6211161Z 2025-05-07T19:45:08.6211169Z 2025-05-07T19:45:08.6211173Z 2025-05-07T19:45:08.6211176Z 2025-05-07T19:45:08.6211180Z 2025-05-07T19:45:08.6211201Z 2025-05-07T19:45:08.6211205Z 2025-05-07T19:45:08.6211208Z 2025-05-07T19:45:08.6211211Z 2025-05-07T19:45:08.6211215Z 2025-05-07T19:45:08.6211218Z 2025-05-07T19:45:08.6211221Z 2025-05-07T19:45:08.6211225Z 2025-05-07T19:45:08.6211228Z 2025-05-07T19:45:08.6211232Z 2025-05-07T19:45:08.6211235Z 2025-05-07T19:45:08.6211238Z 2025-05-07T19:45:08.6211242Z 2025-05-07T19:45:08.6211410Z  2025-05-07T19:45:08.6211655Z 2025-05-07T19:45:08.6211659Z 2025-05-07T19:45:08.6211759Z  2025-05-07T19:45:08.6211869Z 2025-05-07T19:45:08.6211873Z 2025-05-07T19:45:08.6211995Z  2025-05-07T19:45:08.6212106Z 2025-05-07T19:45:08.6212110Z 2025-05-07T19:45:08.6212114Z 2025-05-07T19:45:08.6212222Z  2025-05-07T19:45:08.6212356Z 2025-05-07T19:45:08.6212360Z 2025-05-07T19:45:08.6212364Z 2025-05-07T19:45:08.6212367Z 2025-05-07T19:45:08.6212478Z  2025-05-07T19:45:08.6212599Z 2025-05-07T19:45:08.6212602Z 2025-05-07T19:45:08.6212606Z 2025-05-07T19:45:08.6212609Z 2025-05-07T19:45:08.6212612Z 2025-05-07T19:45:08.6212740Z  2025-05-07T19:45:08.6212871Z 2025-05-07T19:45:08.6212874Z 2025-05-07T19:45:08.6212878Z 2025-05-07T19:45:08.6212881Z 2025-05-07T19:45:08.6212885Z 2025-05-07T19:45:08.6212888Z 2025-05-07T19:45:08.6213001Z  2025-05-07T19:45:08.6213151Z 2025-05-07T19:45:08.6213154Z 2025-05-07T19:45:08.6213157Z 2025-05-07T19:45:08.6213161Z 2025-05-07T19:45:08.6213165Z 2025-05-07T19:45:08.6213168Z 2025-05-07T19:45:08.6213171Z 2025-05-07T19:45:08.6213294Z  2025-05-07T19:45:08.6213456Z 2025-05-07T19:45:08.6213460Z 2025-05-07T19:45:08.6213463Z 2025-05-07T19:45:08.6213466Z 2025-05-07T19:45:08.6213470Z 2025-05-07T19:45:08.6213477Z 2025-05-07T19:45:08.6213481Z 2025-05-07T19:45:08.6213484Z 2025-05-07T19:45:08.6213606Z  2025-05-07T19:45:08.6213771Z 2025-05-07T19:45:08.6213775Z 2025-05-07T19:45:08.6213795Z 2025-05-07T19:45:08.6213799Z 2025-05-07T19:45:08.6213802Z 2025-05-07T19:45:08.6213806Z 2025-05-07T19:45:08.6213809Z 2025-05-07T19:45:08.6213812Z 2025-05-07T19:45:08.6213816Z 2025-05-07T19:45:08.6213940Z  2025-05-07T19:45:08.6214101Z 2025-05-07T19:45:08.6214106Z 2025-05-07T19:45:08.6214109Z 2025-05-07T19:45:08.6214113Z 2025-05-07T19:45:08.6214134Z 2025-05-07T19:45:08.6214138Z 2025-05-07T19:45:08.6214141Z 2025-05-07T19:45:08.6214144Z 2025-05-07T19:45:08.6214148Z 2025-05-07T19:45:08.6214151Z 2025-05-07T19:45:08.6214282Z  2025-05-07T19:45:08.6214457Z 2025-05-07T19:45:08.6214461Z 2025-05-07T19:45:08.6214465Z 2025-05-07T19:45:08.6214468Z 2025-05-07T19:45:08.6214488Z 2025-05-07T19:45:08.6214492Z 2025-05-07T19:45:08.6214495Z 2025-05-07T19:45:08.6214557Z 2025-05-07T19:45:08.6214562Z 2025-05-07T19:45:08.6214565Z 2025-05-07T19:45:08.6214568Z 2025-05-07T19:45:08.6214764Z  2025-05-07T19:45:08.6214947Z 2025-05-07T19:45:08.6214951Z 2025-05-07T19:45:08.6214955Z 2025-05-07T19:45:08.6214959Z 2025-05-07T19:45:08.6214982Z 2025-05-07T19:45:08.6214985Z 2025-05-07T19:45:08.6214989Z 2025-05-07T19:45:08.6214992Z 2025-05-07T19:45:08.6214996Z 2025-05-07T19:45:08.6214999Z 2025-05-07T19:45:08.6215002Z 2025-05-07T19:45:08.6215006Z 2025-05-07T19:45:08.6215142Z  2025-05-07T19:45:08.6215337Z 2025-05-07T19:45:08.6215341Z 2025-05-07T19:45:08.6215345Z 2025-05-07T19:45:08.6215364Z 2025-05-07T19:45:08.6215368Z 2025-05-07T19:45:08.6215371Z 2025-05-07T19:45:08.6215374Z 2025-05-07T19:45:08.6215378Z 2025-05-07T19:45:08.6215381Z 2025-05-07T19:45:08.6215385Z 2025-05-07T19:45:08.6215388Z 2025-05-07T19:45:08.6215392Z 2025-05-07T19:45:08.6215395Z 2025-05-07T19:45:08.6215541Z  2025-05-07T19:45:08.6215739Z 2025-05-07T19:45:08.6215761Z 2025-05-07T19:45:08.6215765Z 2025-05-07T19:45:08.6215772Z 2025-05-07T19:45:08.6215775Z 2025-05-07T19:45:08.6215779Z 2025-05-07T19:45:08.6215782Z 2025-05-07T19:45:08.6215786Z 2025-05-07T19:45:08.6215789Z 2025-05-07T19:45:08.6215792Z 2025-05-07T19:45:08.6215796Z 2025-05-07T19:45:08.6215799Z 2025-05-07T19:45:08.6215803Z 2025-05-07T19:45:08.6215806Z 2025-05-07T19:45:08.6215950Z  2025-05-07T19:45:08.6216172Z 2025-05-07T19:45:08.6216175Z 2025-05-07T19:45:08.6216179Z 2025-05-07T19:45:08.6216183Z 2025-05-07T19:45:08.6216186Z 2025-05-07T19:45:08.6216190Z 2025-05-07T19:45:08.6216193Z 2025-05-07T19:45:08.6216197Z 2025-05-07T19:45:08.6216200Z 2025-05-07T19:45:08.6216203Z 2025-05-07T19:45:08.6216207Z 2025-05-07T19:45:08.6216210Z 2025-05-07T19:45:08.6216214Z 2025-05-07T19:45:08.6216217Z 2025-05-07T19:45:08.6216220Z 2025-05-07T19:45:08.6216390Z  2025-05-07T19:45:08.6216597Z 2025-05-07T19:45:08.6216601Z 2025-05-07T19:45:08.6216605Z 2025-05-07T19:45:08.6216612Z 2025-05-07T19:45:08.6216615Z 2025-05-07T19:45:08.6216619Z 2025-05-07T19:45:08.6216622Z 2025-05-07T19:45:08.6216625Z 2025-05-07T19:45:08.6216629Z 2025-05-07T19:45:08.6216632Z 2025-05-07T19:45:08.6216636Z 2025-05-07T19:45:08.6216639Z 2025-05-07T19:45:08.6216642Z 2025-05-07T19:45:08.6216646Z 2025-05-07T19:45:08.6216649Z 2025-05-07T19:45:08.6216653Z 2025-05-07T19:45:08.6216824Z  2025-05-07T19:45:08.6217035Z 2025-05-07T19:45:08.6217039Z 2025-05-07T19:45:08.6217042Z 2025-05-07T19:45:08.6217046Z 2025-05-07T19:45:08.6217049Z 2025-05-07T19:45:08.6217053Z 2025-05-07T19:45:08.6217056Z 2025-05-07T19:45:08.6217060Z 2025-05-07T19:45:08.6217063Z 2025-05-07T19:45:08.6217067Z 2025-05-07T19:45:08.6217070Z 2025-05-07T19:45:08.6217090Z 2025-05-07T19:45:08.6217093Z 2025-05-07T19:45:08.6217096Z 2025-05-07T19:45:08.6217103Z 2025-05-07T19:45:08.6217106Z 2025-05-07T19:45:08.6217109Z 2025-05-07T19:45:08.6217269Z  2025-05-07T19:45:08.6217491Z 2025-05-07T19:45:08.6217495Z 2025-05-07T19:45:08.6217499Z 2025-05-07T19:45:08.6217502Z 2025-05-07T19:45:08.6217505Z 2025-05-07T19:45:08.6217525Z 2025-05-07T19:45:08.6217528Z 2025-05-07T19:45:08.6217532Z 2025-05-07T19:45:08.6217535Z 2025-05-07T19:45:08.6217538Z 2025-05-07T19:45:08.6217542Z 2025-05-07T19:45:08.6217545Z 2025-05-07T19:45:08.6217549Z 2025-05-07T19:45:08.6217553Z 2025-05-07T19:45:08.6217556Z 2025-05-07T19:45:08.6217559Z 2025-05-07T19:45:08.6217563Z 2025-05-07T19:45:08.6217566Z 2025-05-07T19:45:08.6217732Z  2025-05-07T19:45:08.6217971Z 2025-05-07T19:45:08.6217974Z 2025-05-07T19:45:08.6218074Z  2025-05-07T19:45:08.6218181Z 2025-05-07T19:45:08.6218184Z 2025-05-07T19:45:08.6218302Z  2025-05-07T19:45:08.6218409Z 2025-05-07T19:45:08.6218469Z 2025-05-07T19:45:08.6218474Z 2025-05-07T19:45:08.6218580Z  2025-05-07T19:45:08.6218708Z 2025-05-07T19:45:08.6218712Z 2025-05-07T19:45:08.6218774Z 2025-05-07T19:45:08.6218777Z 2025-05-07T19:45:08.6218885Z  2025-05-07T19:45:08.6219006Z 2025-05-07T19:45:08.6219009Z 2025-05-07T19:45:08.6219013Z 2025-05-07T19:45:08.6219016Z 2025-05-07T19:45:08.6219019Z 2025-05-07T19:45:08.6219143Z  2025-05-07T19:45:08.6219269Z 2025-05-07T19:45:08.6219273Z 2025-05-07T19:45:08.6219276Z 2025-05-07T19:45:08.6219279Z 2025-05-07T19:45:08.6219283Z 2025-05-07T19:45:08.6219287Z 2025-05-07T19:45:08.6219399Z  2025-05-07T19:45:08.6219549Z 2025-05-07T19:45:08.6219552Z 2025-05-07T19:45:08.6219556Z 2025-05-07T19:45:08.6219559Z 2025-05-07T19:45:08.6219562Z 2025-05-07T19:45:08.6219566Z 2025-05-07T19:45:08.6219569Z 2025-05-07T19:45:08.6219686Z  2025-05-07T19:45:08.6219844Z 2025-05-07T19:45:08.6219847Z 2025-05-07T19:45:08.6219851Z 2025-05-07T19:45:08.6219857Z 2025-05-07T19:45:08.6219861Z 2025-05-07T19:45:08.6219864Z 2025-05-07T19:45:08.6219868Z 2025-05-07T19:45:08.6219872Z 2025-05-07T19:45:08.6219997Z  2025-05-07T19:45:08.6220151Z 2025-05-07T19:45:08.6220155Z 2025-05-07T19:45:08.6220176Z 2025-05-07T19:45:08.6220179Z 2025-05-07T19:45:08.6220182Z 2025-05-07T19:45:08.6220186Z 2025-05-07T19:45:08.6220190Z 2025-05-07T19:45:08.6220194Z 2025-05-07T19:45:08.6220197Z 2025-05-07T19:45:08.6220321Z  2025-05-07T19:45:08.6220485Z 2025-05-07T19:45:08.6220488Z 2025-05-07T19:45:08.6220492Z 2025-05-07T19:45:08.6220512Z 2025-05-07T19:45:08.6220516Z 2025-05-07T19:45:08.6220519Z 2025-05-07T19:45:08.6220523Z 2025-05-07T19:45:08.6220527Z 2025-05-07T19:45:08.6220530Z 2025-05-07T19:45:08.6220534Z 2025-05-07T19:45:08.6220669Z  2025-05-07T19:45:08.6220839Z 2025-05-07T19:45:08.6220843Z 2025-05-07T19:45:08.6220846Z 2025-05-07T19:45:08.6220849Z 2025-05-07T19:45:08.6220874Z 2025-05-07T19:45:08.6220878Z 2025-05-07T19:45:08.6220881Z 2025-05-07T19:45:08.6220885Z 2025-05-07T19:45:08.6220888Z 2025-05-07T19:45:08.6220894Z 2025-05-07T19:45:08.6220898Z 2025-05-07T19:45:08.6221031Z  2025-05-07T19:45:08.6221212Z 2025-05-07T19:45:08.6221215Z 2025-05-07T19:45:08.6221219Z 2025-05-07T19:45:08.6221222Z 2025-05-07T19:45:08.6221242Z 2025-05-07T19:45:08.6221246Z 2025-05-07T19:45:08.6221249Z 2025-05-07T19:45:08.6221253Z 2025-05-07T19:45:08.6221257Z 2025-05-07T19:45:08.6221260Z 2025-05-07T19:45:08.6221263Z 2025-05-07T19:45:08.6221267Z 2025-05-07T19:45:08.6221404Z  2025-05-07T19:45:08.6221593Z 2025-05-07T19:45:08.6221597Z 2025-05-07T19:45:08.6221601Z 2025-05-07T19:45:08.6221621Z 2025-05-07T19:45:08.6221625Z 2025-05-07T19:45:08.6221628Z 2025-05-07T19:45:08.6221631Z 2025-05-07T19:45:08.6221635Z 2025-05-07T19:45:08.6221638Z 2025-05-07T19:45:08.6221642Z 2025-05-07T19:45:08.6221645Z 2025-05-07T19:45:08.6221652Z 2025-05-07T19:45:08.6221655Z 2025-05-07T19:45:08.6221796Z  2025-05-07T19:45:08.6221994Z 2025-05-07T19:45:08.6222019Z 2025-05-07T19:45:08.6222023Z 2025-05-07T19:45:08.6222027Z 2025-05-07T19:45:08.6222030Z 2025-05-07T19:45:08.6222034Z 2025-05-07T19:45:08.6222037Z 2025-05-07T19:45:08.6222040Z 2025-05-07T19:45:08.6222044Z 2025-05-07T19:45:08.6222047Z 2025-05-07T19:45:08.6222051Z 2025-05-07T19:45:08.6222054Z 2025-05-07T19:45:08.6222057Z 2025-05-07T19:45:08.6222061Z 2025-05-07T19:45:08.6222208Z  2025-05-07T19:45:08.6222431Z 2025-05-07T19:45:08.6222435Z 2025-05-07T19:45:08.6222438Z 2025-05-07T19:45:08.6222442Z 2025-05-07T19:45:08.6222445Z 2025-05-07T19:45:08.6222449Z 2025-05-07T19:45:08.6222452Z 2025-05-07T19:45:08.6222456Z 2025-05-07T19:45:08.6222459Z 2025-05-07T19:45:08.6222462Z 2025-05-07T19:45:08.6222466Z 2025-05-07T19:45:08.6222469Z 2025-05-07T19:45:08.6222473Z 2025-05-07T19:45:08.6222476Z 2025-05-07T19:45:08.6222535Z 2025-05-07T19:45:08.6222704Z  2025-05-07T19:45:08.6222912Z 2025-05-07T19:45:08.6222972Z 2025-05-07T19:45:08.6222976Z 2025-05-07T19:45:08.6222980Z 2025-05-07T19:45:08.6222983Z 2025-05-07T19:45:08.6222987Z 2025-05-07T19:45:08.6222990Z 2025-05-07T19:45:08.6222994Z 2025-05-07T19:45:08.6222997Z 2025-05-07T19:45:08.6223001Z 2025-05-07T19:45:08.6223004Z 2025-05-07T19:45:08.6223007Z 2025-05-07T19:45:08.6223011Z 2025-05-07T19:45:08.6223014Z 2025-05-07T19:45:08.6223018Z 2025-05-07T19:45:08.6223021Z 2025-05-07T19:45:08.6223198Z  2025-05-07T19:45:08.6223526Z 2025-05-07T19:45:08.6223530Z 2025-05-07T19:45:08.6223534Z 2025-05-07T19:45:08.6223537Z 2025-05-07T19:45:08.6223541Z 2025-05-07T19:45:08.6223545Z 2025-05-07T19:45:08.6223548Z 2025-05-07T19:45:08.6223551Z 2025-05-07T19:45:08.6223555Z 2025-05-07T19:45:08.6223558Z 2025-05-07T19:45:08.6223562Z 2025-05-07T19:45:08.6223585Z 2025-05-07T19:45:08.6223592Z 2025-05-07T19:45:08.6223596Z 2025-05-07T19:45:08.6223599Z 2025-05-07T19:45:08.6223603Z 2025-05-07T19:45:08.6223610Z 2025-05-07T19:45:08.6223782Z  2025-05-07T19:45:08.6224006Z 2025-05-07T19:45:08.6224009Z 2025-05-07T19:45:08.6224014Z 2025-05-07T19:45:08.6224017Z 2025-05-07T19:45:08.6224020Z 2025-05-07T19:45:08.6224045Z 2025-05-07T19:45:08.6224048Z 2025-05-07T19:45:08.6224052Z 2025-05-07T19:45:08.6224056Z 2025-05-07T19:45:08.6224059Z 2025-05-07T19:45:08.6224062Z 2025-05-07T19:45:08.6224066Z 2025-05-07T19:45:08.6224069Z 2025-05-07T19:45:08.6224072Z 2025-05-07T19:45:08.6224076Z 2025-05-07T19:45:08.6224079Z 2025-05-07T19:45:08.6224083Z 2025-05-07T19:45:08.6224086Z 2025-05-07T19:45:08.6224281Z  2025-05-07T19:45:08.6224529Z 2025-05-07T19:45:08.6224616Z done 2025-05-07T19:45:08.9425868Z Preparing transaction: | / - done 2025-05-07T19:45:12.1055754Z Verifying transaction: | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / done 2025-05-07T19:45:14.8236408Z Executing transaction: \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / done 2025-05-07T19:45:15.1331286Z [INSTALL] Adding symlink librhash.so.0, which is needed by CMake ... 2025-05-07T19:45:17.0185237Z + ln -s /github/home/miniconda/envs/build_binary/lib/librhash.so /github/home/miniconda/envs/build_binary/lib/librhash.so.0 2025-05-07T19:45:17.0185936Z 2025-05-07T19:45:17.0210323Z 2025-05-07T19:45:17.0237234Z [EXEC] [ATTEMPT 0/3] + conda run -n build_binary pip install build 2025-05-07T19:45:19.3749288Z WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager, possibly rendering your system unusable. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv. Use the --root-user-action option if you know what you are doing and want to suppress this warning. 2025-05-07T19:45:19.3750970Z 2025-05-07T19:45:19.3751093Z Collecting build 2025-05-07T19:45:19.3751511Z Downloading build-1.2.2.post1-py3-none-any.whl.metadata (6.5 kB) 2025-05-07T19:45:19.3752381Z Requirement already satisfied: packaging>=19.1 in /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages (from build) (25.0) 2025-05-07T19:45:19.3753152Z Collecting pyproject_hooks (from build) 2025-05-07T19:45:19.3753610Z Downloading pyproject_hooks-1.2.0-py3-none-any.whl.metadata (1.3 kB) 2025-05-07T19:45:19.3754155Z Downloading build-1.2.2.post1-py3-none-any.whl (22 kB) 2025-05-07T19:45:19.3754624Z Downloading pyproject_hooks-1.2.0-py3-none-any.whl (10 kB) 2025-05-07T19:45:19.3755116Z Installing collected packages: pyproject_hooks, build 2025-05-07T19:45:19.3755391Z 2025-05-07T19:45:19.3755630Z Successfully installed build-1.2.2.post1 pyproject_hooks-1.2.0 2025-05-07T19:45:19.3755940Z 2025-05-07T19:45:21.2566969Z /github/home/miniconda/envs/build_binary/bin/make 2025-05-07T19:45:21.2567861Z 2025-05-07T19:45:21.3174626Z [CHECK] Binary make found in PATH 2025-05-07T19:45:23.1032251Z /github/home/miniconda/envs/build_binary/bin/cmake 2025-05-07T19:45:23.1033067Z 2025-05-07T19:45:23.1605756Z [CHECK] Binary cmake found in PATH 2025-05-07T19:45:24.9443756Z /github/home/miniconda/envs/build_binary/bin/ninja 2025-05-07T19:45:24.9444566Z 2025-05-07T19:45:25.0027203Z [CHECK] Binary ninja found in PATH 2025-05-07T19:45:26.9125342Z [CHECK] Python (sub-)package 'click' found ... 2025-05-07T19:45:28.9499141Z [CHECK] Python (sub-)package 'hypothesis' found ... 2025-05-07T19:45:30.8712753Z [CHECK] Python (sub-)package 'jinja2' found ... 2025-05-07T19:45:32.8585006Z [CHECK] Python (sub-)package 'skbuild' found ... 2025-05-07T19:45:34.7425326Z [CHECK] Python (sub-)package 'wheel' found ... 2025-05-07T19:45:34.7425866Z [INSTALL] Successfully installed all the build tools 2025-05-07T19:45:34.7500319Z ##[group]Run . $PRELUDE; install_cuda $BUILD_ENV 12.8.0 2025-05-07T19:45:34.7500802Z . $PRELUDE; install_cuda $BUILD_ENV 12.8.0 2025-05-07T19:45:34.7501446Z shell: bash --noprofile --norc -e -o pipefail {0} 2025-05-07T19:45:34.7501841Z env: 2025-05-07T19:45:34.7502080Z PRELUDE: .github/scripts/setup_env.bash 2025-05-07T19:45:34.7502425Z BUILD_ENV: build_binary 2025-05-07T19:45:34.7502682Z BUILD_TARGET: genai 2025-05-07T19:45:34.7502949Z BUILD_VARIANT: cuda 2025-05-07T19:45:34.7503233Z BUILD_CUDA_VERSION: 12.8.0 2025-05-07T19:45:34.7503626Z ##[endgroup] 2025-05-07T19:45:35.1911294Z ################################################################################ 2025-05-07T19:45:35.1912348Z # Install CUDA 2025-05-07T19:45:35.1912955Z # 2025-05-07T19:45:35.1927871Z # [2025-05-07T19:45:35.192Z] + install_cuda build_binary 12.8.0 2025-05-07T19:45:35.1929123Z ################################################################################ 2025-05-07T19:45:35.1929929Z 2025-05-07T19:45:35.1943737Z [EXEC] [ATTEMPT 0/3] + wget -q --timeout 1 pypi.org -O /dev/null 2025-05-07T19:45:35.2833117Z [CHECK] Network does not appear to be blocked. 2025-05-07T19:45:35.2834182Z [SETUP] Cleaning up Conda packages ... 2025-05-07T19:45:35.2840304Z + conda clean --packages --tarball -y 2025-05-07T19:45:35.2840536Z 2025-05-07T19:45:35.8141031Z Will remove 136 (501.5 MB) tarball(s). 2025-05-07T19:45:35.8141600Z Will remove 19 (112.8 MB) package(s). 2025-05-07T19:45:35.8699698Z 2025-05-07T19:45:35.8714230Z + conda clean --all -y 2025-05-07T19:45:35.8714452Z 2025-05-07T19:45:36.4869976Z There are no unused tarball(s) to remove. 2025-05-07T19:45:36.4870998Z Will remove 1 index cache(s). 2025-05-07T19:45:36.4871840Z There are no unused package(s) to remove. 2025-05-07T19:45:36.4872796Z There are no tempfile(s) to remove. 2025-05-07T19:45:36.4873632Z There are no logfile(s) to remove. 2025-05-07T19:45:36.5455625Z 2025-05-07T19:45:36.5465336Z [INSTALL] Installing CUDA 12.8.0 ... 2025-05-07T19:45:36.5493382Z [EXEC] [ATTEMPT 0/3] + conda install --force-reinstall -n build_binary -c conda-forge --override-channels -y cuda=12.8.0 2025-05-07T19:45:37.3986568Z Channels: 2025-05-07T19:45:37.3987246Z - conda-forge 2025-05-07T19:45:37.3987936Z Platform: linux-64 2025-05-07T19:45:47.5568967Z Collecting package metadata (repodata.json): - \ | / - \ | / - \ | / - \ | / - \ | / done 2025-05-07T19:45:51.2142232Z Solving environment: \ | / - \ | done 2025-05-07T19:45:51.3390499Z 2025-05-07T19:45:51.3390988Z ## Package Plan ## 2025-05-07T19:45:51.3391158Z 2025-05-07T19:45:51.3391362Z environment location: /github/home/miniconda/envs/build_binary 2025-05-07T19:45:51.3391775Z 2025-05-07T19:45:51.3391882Z added / updated specs: 2025-05-07T19:45:51.3392163Z - cuda=12.8.0 2025-05-07T19:45:51.3392304Z 2025-05-07T19:45:51.3392309Z 2025-05-07T19:45:51.3392434Z The following packages will be downloaded: 2025-05-07T19:45:51.3392679Z 2025-05-07T19:45:51.3392834Z package | build 2025-05-07T19:45:51.3393184Z ---------------------------|----------------- 2025-05-07T19:45:51.3393545Z attr-2.5.1 | h166bdaf_1 69 KB conda-forge 2025-05-07T19:45:51.3394381Z binutils-2.40 | h4852527_7 31 KB conda-forge 2025-05-07T19:45:51.3394830Z c-compiler-1.5.2 | h0b41bf4_0 6 KB conda-forge 2025-05-07T19:45:51.3395273Z cuda-12.8.0 | ha804496_0 26 KB conda-forge 2025-05-07T19:45:51.3395709Z cuda-cccl_linux-64-12.8.55 | ha770c72_1 1.0 MB conda-forge 2025-05-07T19:45:51.3396244Z cuda-command-line-tools-12.8.0| ha770c72_0 20 KB conda-forge 2025-05-07T19:45:51.3396767Z cuda-compiler-12.8.0 | hbad6d8a_0 20 KB conda-forge 2025-05-07T19:45:51.3397250Z cuda-crt-dev_linux-64-12.8.61| ha770c72_1 90 KB conda-forge 2025-05-07T19:45:51.3397957Z cuda-crt-tools-12.8.61 | ha770c72_1 27 KB conda-forge 2025-05-07T19:45:51.3398424Z cuda-cudart-12.8.57 | h5888daf_1 22 KB conda-forge 2025-05-07T19:45:51.3398917Z cuda-cudart-dev-12.8.57 | h5888daf_1 23 KB conda-forge 2025-05-07T19:45:51.3399438Z cuda-cudart-dev_linux-64-12.8.57| h3f2d84a_1 377 KB conda-forge 2025-05-07T19:45:51.3399954Z cuda-cudart-static-12.8.57 | h5888daf_1 22 KB conda-forge 2025-05-07T19:45:51.3400501Z cuda-cudart-static_linux-64-12.8.57| h3f2d84a_1 950 KB conda-forge 2025-05-07T19:45:51.3401028Z cuda-cudart_linux-64-12.8.57| h3f2d84a_1 188 KB conda-forge 2025-05-07T19:45:51.3401536Z cuda-cuobjdump-12.8.55 | hbd13f7d_0 227 KB conda-forge 2025-05-07T19:45:51.3401999Z cuda-cupti-12.8.57 | hbd13f7d_0 1.8 MB conda-forge 2025-05-07T19:45:51.3402594Z cuda-cupti-dev-12.8.57 | h5888daf_0 4.0 MB conda-forge 2025-05-07T19:45:51.3403057Z cuda-cuxxfilt-12.8.55 | hbd13f7d_0 211 KB conda-forge 2025-05-07T19:45:51.3403509Z cuda-driver-dev-12.8.57 | h5888daf_1 22 KB conda-forge 2025-05-07T19:45:51.3404020Z cuda-driver-dev_linux-64-12.8.90| h3f2d84a_1 36 KB conda-forge 2025-05-07T19:45:51.3404482Z cuda-gdb-12.8.55 | h50b4baa_0 353 KB conda-forge 2025-05-07T19:45:51.3404927Z cuda-libraries-12.8.0 | ha770c72_0 20 KB conda-forge 2025-05-07T19:45:51.3405387Z cuda-libraries-dev-12.8.0 | ha770c72_0 20 KB conda-forge 2025-05-07T19:45:51.3405859Z cuda-nsight-12.8.55 | h7938cbb_0 113.2 MB conda-forge 2025-05-07T19:45:51.3406294Z cuda-nvcc-12.8.61 | hcdd1206_0 23 KB conda-forge 2025-05-07T19:45:51.3406742Z cuda-nvcc-dev_linux-64-12.8.61| he91c749_1 12.7 MB conda-forge 2025-05-07T19:45:51.3407223Z cuda-nvcc-impl-12.8.61 | h85509e4_1 25 KB conda-forge 2025-05-07T19:45:51.3407668Z cuda-nvcc-tools-12.8.61 | he02047a_1 24.5 MB conda-forge 2025-05-07T19:45:51.3408150Z cuda-nvcc_linux-64-12.8.61 | h04802cd_0 25 KB conda-forge 2025-05-07T19:45:51.3408621Z cuda-nvdisasm-12.8.55 | hbd13f7d_0 4.9 MB conda-forge 2025-05-07T19:45:51.3409495Z cuda-nvml-dev-12.8.55 | hbd13f7d_0 134 KB conda-forge 2025-05-07T19:45:51.3409995Z cuda-nvprof-12.8.57 | hbd13f7d_0 2.5 MB conda-forge 2025-05-07T19:45:51.3410471Z cuda-nvprune-12.8.55 | hbd13f7d_0 68 KB conda-forge 2025-05-07T19:45:51.3410961Z cuda-nvrtc-12.8.61 | hbd13f7d_0 63.1 MB conda-forge 2025-05-07T19:45:51.3411433Z cuda-nvrtc-dev-12.8.61 | h5888daf_0 34 KB conda-forge 2025-05-07T19:45:51.3411916Z cuda-nvtx-12.8.55 | hbd13f7d_0 31 KB conda-forge 2025-05-07T19:45:51.3412415Z cuda-nvvm-dev_linux-64-12.8.61| ha770c72_1 25 KB conda-forge 2025-05-07T19:45:51.3412911Z cuda-nvvm-impl-12.8.61 | he02047a_1 20.8 MB conda-forge 2025-05-07T19:45:51.3413572Z cuda-nvvm-tools-12.8.61 | he02047a_1 23.5 MB conda-forge 2025-05-07T19:45:51.3414039Z cuda-nvvp-12.8.57 | hbd13f7d_0 112.4 MB conda-forge 2025-05-07T19:45:51.3414517Z cuda-opencl-12.8.55 | hbd13f7d_0 29 KB conda-forge 2025-05-07T19:45:51.3415015Z cuda-opencl-dev-12.8.55 | h5888daf_0 95 KB conda-forge 2025-05-07T19:45:51.3415732Z cuda-profiler-api-12.8.55 | h7938cbb_0 22 KB conda-forge 2025-05-07T19:45:51.3416207Z cuda-runtime-12.8.0 | ha804496_0 20 KB conda-forge 2025-05-07T19:45:51.3416663Z cuda-sanitizer-api-12.8.55 | hbd13f7d_0 8.8 MB conda-forge 2025-05-07T19:45:51.3417275Z cuda-toolkit-12.8.0 | ha804496_0 20 KB conda-forge 2025-05-07T19:45:51.3417702Z cuda-tools-12.8.0 | ha770c72_0 19 KB conda-forge 2025-05-07T19:45:51.3418145Z cuda-version-12.8 | h5d125a7_3 21 KB conda-forge 2025-05-07T19:45:51.3418693Z cuda-visual-tools-12.8.0 | ha770c72_0 20 KB conda-forge 2025-05-07T19:45:51.3419164Z cxx-compiler-1.5.2 | hf52228f_0 6 KB conda-forge 2025-05-07T19:45:51.3419567Z dbus-1.13.6 | h5008d03_3 604 KB conda-forge 2025-05-07T19:45:51.3419959Z gcc-11.4.0 | h602e360_13 49 KB conda-forge 2025-05-07T19:45:51.3420351Z gds-tools-1.13.0.11 | h5888daf_0 37.9 MB conda-forge 2025-05-07T19:45:51.3420762Z gmp-6.3.0 | hac33072_2 449 KB conda-forge 2025-05-07T19:45:51.3421147Z gxx-11.4.0 | h602e360_13 49 KB conda-forge 2025-05-07T19:45:51.3421523Z libcap-2.75 | h39aace5_0 118 KB conda-forge 2025-05-07T19:45:51.3421950Z libcublas-12.8.3.14 | h9ab20c4_0 460.2 MB conda-forge 2025-05-07T19:45:51.3422389Z libcublas-dev-12.8.3.14 | h9ab20c4_0 89 KB conda-forge 2025-05-07T19:45:51.3422836Z libcufft-11.3.3.41 | hbd13f7d_0 147.4 MB conda-forge 2025-05-07T19:45:51.3423262Z libcufft-dev-11.3.3.41 | h5888daf_0 33 KB conda-forge 2025-05-07T19:45:51.3424023Z libcufile-1.13.0.11 | h12f29b5_0 939 KB conda-forge 2025-05-07T19:45:51.3424521Z libcufile-dev-1.13.0.11 | h5888daf_0 35 KB conda-forge 2025-05-07T19:45:51.3425023Z libcurand-10.3.9.55 | hbd13f7d_0 43.6 MB conda-forge 2025-05-07T19:45:51.3425496Z libcurand-dev-10.3.9.55 | h5888daf_0 265 KB conda-forge 2025-05-07T19:45:51.3426000Z libcusolver-11.7.2.55 | h9ab20c4_0 156.9 MB conda-forge 2025-05-07T19:45:51.3426492Z libcusolver-dev-11.7.2.55 | h9ab20c4_0 59 KB conda-forge 2025-05-07T19:45:51.3427001Z libcusparse-12.5.7.53 | hbd13f7d_0 164.9 MB conda-forge 2025-05-07T19:45:51.3427493Z libcusparse-dev-12.5.7.53 | h5888daf_0 51 KB conda-forge 2025-05-07T19:45:51.3428001Z libgcrypt-lib-1.11.0 | hb9d3cd8_2 572 KB conda-forge 2025-05-07T19:45:51.3428483Z libglvnd-1.7.0 | ha4b6fd6_2 129 KB conda-forge 2025-05-07T19:45:51.3428932Z libgpg-error-1.55 | h3f2d84a_0 305 KB conda-forge 2025-05-07T19:45:51.3429403Z libnl-3.11.0 | hb9d3cd8_0 724 KB conda-forge 2025-05-07T19:45:51.3429834Z libnpp-12.3.3.65 | hbd13f7d_0 130.6 MB conda-forge 2025-05-07T19:45:51.3430405Z libnpp-dev-12.3.3.65 | h5888daf_0 443 KB conda-forge 2025-05-07T19:45:51.3430827Z libnuma-2.0.18 | h4ab18f5_2 42 KB conda-forge 2025-05-07T19:45:51.3431272Z libnvfatbin-12.8.55 | hbd13f7d_0 793 KB conda-forge 2025-05-07T19:45:51.3431871Z libnvfatbin-dev-12.8.55 | h5888daf_0 26 KB conda-forge 2025-05-07T19:45:51.3432325Z libnvjitlink-12.8.61 | hbd13f7d_0 28.7 MB conda-forge 2025-05-07T19:45:51.3432808Z libnvjitlink-dev-12.8.61 | h5888daf_0 25 KB conda-forge 2025-05-07T19:45:51.3433263Z libnvjpeg-12.3.5.57 | h97fd463_0 3.0 MB conda-forge 2025-05-07T19:45:51.3433721Z libnvjpeg-dev-12.3.5.57 | ha770c72_0 31 KB conda-forge 2025-05-07T19:45:51.3434172Z libopengl-1.7.0 | ha4b6fd6_2 50 KB conda-forge 2025-05-07T19:45:51.3434593Z libsqlite-3.49.2 | hee588c1_0 895 KB conda-forge 2025-05-07T19:45:51.3435118Z libsystemd0-257.4 | h4e0b6ca_1 477 KB conda-forge 2025-05-07T19:45:51.3435551Z libudev1-257.4 | hbe16f8c_1 141 KB conda-forge 2025-05-07T19:45:51.3435989Z libxkbcommon-1.9.2 | h65c71a3_0 660 KB conda-forge 2025-05-07T19:45:51.3436418Z libxkbfile-1.1.0 | h166bdaf_1 111 KB conda-forge 2025-05-07T19:45:51.3436841Z libxml2-2.13.8 | h4bc477f_0 675 KB conda-forge 2025-05-07T19:45:51.3437241Z lz4-c-1.10.0 | h5888daf_1 163 KB conda-forge 2025-05-07T19:45:51.3437671Z nsight-compute-2025.1.0.14 | hb5ebaad_0 320.6 MB conda-forge 2025-05-07T19:45:51.3438115Z nspr-4.36 | h5888daf_0 225 KB conda-forge 2025-05-07T19:45:51.3438483Z nss-3.111 | h159eef7_0 1.9 MB conda-forge 2025-05-07T19:45:51.3438875Z ocl-icd-2.3.3 | hb9d3cd8_0 104 KB conda-forge 2025-05-07T19:45:51.3439327Z opencl-headers-2024.10.24 | h5888daf_0 53 KB conda-forge 2025-05-07T19:45:51.3439761Z rdma-core-57.0 | h5888daf_0 1.2 MB conda-forge 2025-05-07T19:45:51.3440176Z sqlite-3.49.2 | h9eae976_0 840 KB conda-forge 2025-05-07T19:45:51.3440570Z wayland-1.23.1 | h3e06ad9_0 314 KB conda-forge 2025-05-07T19:45:51.3440985Z xcb-util-0.4.1 | hb711507_2 19 KB conda-forge 2025-05-07T19:45:51.3441408Z xcb-util-cursor-0.1.5 | hb9d3cd8_0 20 KB conda-forge 2025-05-07T19:45:51.3441874Z xcb-util-image-0.4.0 | hb711507_2 24 KB conda-forge 2025-05-07T19:45:51.3442342Z xcb-util-keysyms-0.4.1 | hb711507_0 14 KB conda-forge 2025-05-07T19:45:51.3442812Z xcb-util-renderutil-0.3.10 | hb711507_0 17 KB conda-forge 2025-05-07T19:45:51.3443287Z xcb-util-wm-0.4.2 | hb711507_0 50 KB conda-forge 2025-05-07T19:45:51.3443725Z xkeyboard-config-2.44 | hb9d3cd8_0 384 KB conda-forge 2025-05-07T19:45:51.3444210Z xorg-libxcomposite-0.4.6 | hb9d3cd8_2 13 KB conda-forge 2025-05-07T19:45:51.3444682Z xorg-libxdamage-1.1.6 | hb9d3cd8_0 13 KB conda-forge 2025-05-07T19:45:51.3445110Z ------------------------------------------------------------ 2025-05-07T19:45:51.3445461Z Total: 1.86 GB 2025-05-07T19:45:51.3445668Z 2025-05-07T19:45:51.3445792Z The following NEW packages will be INSTALLED: 2025-05-07T19:45:51.3446024Z 2025-05-07T19:45:51.3446191Z attr conda-forge/linux-64::attr-2.5.1-h166bdaf_1 2025-05-07T19:45:51.3446599Z binutils conda-forge/linux-64::binutils-2.40-h4852527_7 2025-05-07T19:45:51.3447059Z c-compiler conda-forge/linux-64::c-compiler-1.5.2-h0b41bf4_0 2025-05-07T19:45:51.3447502Z cuda conda-forge/noarch::cuda-12.8.0-ha804496_0 2025-05-07T19:45:51.3447959Z cuda-cccl_linux-64 conda-forge/noarch::cuda-cccl_linux-64-12.8.55-ha770c72_1 2025-05-07T19:45:51.3448554Z cuda-command-line~ conda-forge/linux-64::cuda-command-line-tools-12.8.0-ha770c72_0 2025-05-07T19:45:51.3449200Z cuda-compiler conda-forge/noarch::cuda-compiler-12.8.0-hbad6d8a_0 2025-05-07T19:45:51.3449755Z cuda-crt-dev_linu~ conda-forge/noarch::cuda-crt-dev_linux-64-12.8.61-ha770c72_1 2025-05-07T19:45:51.3450324Z cuda-crt-tools conda-forge/linux-64::cuda-crt-tools-12.8.61-ha770c72_1 2025-05-07T19:45:51.3450828Z cuda-cudart conda-forge/linux-64::cuda-cudart-12.8.57-h5888daf_1 2025-05-07T19:45:51.3451357Z cuda-cudart-dev conda-forge/linux-64::cuda-cudart-dev-12.8.57-h5888daf_1 2025-05-07T19:45:51.3451928Z cuda-cudart-dev_l~ conda-forge/noarch::cuda-cudart-dev_linux-64-12.8.57-h3f2d84a_1 2025-05-07T19:45:51.3452606Z cuda-cudart-static conda-forge/linux-64::cuda-cudart-static-12.8.57-h5888daf_1 2025-05-07T19:45:51.3453229Z cuda-cudart-stati~ conda-forge/noarch::cuda-cudart-static_linux-64-12.8.57-h3f2d84a_1 2025-05-07T19:45:51.3453819Z cuda-cudart_linux~ conda-forge/noarch::cuda-cudart_linux-64-12.8.57-h3f2d84a_1 2025-05-07T19:45:51.3454385Z cuda-cuobjdump conda-forge/linux-64::cuda-cuobjdump-12.8.55-hbd13f7d_0 2025-05-07T19:45:51.3454888Z cuda-cupti conda-forge/linux-64::cuda-cupti-12.8.57-hbd13f7d_0 2025-05-07T19:45:51.3455400Z cuda-cupti-dev conda-forge/linux-64::cuda-cupti-dev-12.8.57-h5888daf_0 2025-05-07T19:45:51.3455939Z cuda-cuxxfilt conda-forge/linux-64::cuda-cuxxfilt-12.8.55-hbd13f7d_0 2025-05-07T19:45:51.3456466Z cuda-driver-dev conda-forge/linux-64::cuda-driver-dev-12.8.57-h5888daf_1 2025-05-07T19:45:51.3457049Z cuda-driver-dev_l~ conda-forge/noarch::cuda-driver-dev_linux-64-12.8.90-h3f2d84a_1 2025-05-07T19:45:51.3457564Z cuda-gdb conda-forge/linux-64::cuda-gdb-12.8.55-h50b4baa_0 2025-05-07T19:45:51.3458061Z cuda-libraries conda-forge/linux-64::cuda-libraries-12.8.0-ha770c72_0 2025-05-07T19:45:51.3458626Z cuda-libraries-dev conda-forge/linux-64::cuda-libraries-dev-12.8.0-ha770c72_0 2025-05-07T19:45:51.3459157Z cuda-nsight conda-forge/linux-64::cuda-nsight-12.8.55-h7938cbb_0 2025-05-07T19:45:51.3459641Z cuda-nvcc conda-forge/linux-64::cuda-nvcc-12.8.61-hcdd1206_0 2025-05-07T19:45:51.3460148Z cuda-nvcc-dev_lin~ conda-forge/noarch::cuda-nvcc-dev_linux-64-12.8.61-he91c749_1 2025-05-07T19:45:51.3460708Z cuda-nvcc-impl conda-forge/linux-64::cuda-nvcc-impl-12.8.61-h85509e4_1 2025-05-07T19:45:51.3461248Z cuda-nvcc-tools conda-forge/linux-64::cuda-nvcc-tools-12.8.61-he02047a_1 2025-05-07T19:45:51.3461787Z cuda-nvcc_linux-64 conda-forge/linux-64::cuda-nvcc_linux-64-12.8.61-h04802cd_0 2025-05-07T19:45:51.3462330Z cuda-nvdisasm conda-forge/linux-64::cuda-nvdisasm-12.8.55-hbd13f7d_0 2025-05-07T19:45:51.3462837Z cuda-nvml-dev conda-forge/linux-64::cuda-nvml-dev-12.8.55-hbd13f7d_0 2025-05-07T19:45:51.3463437Z cuda-nvprof conda-forge/linux-64::cuda-nvprof-12.8.57-hbd13f7d_0 2025-05-07T19:45:51.3464145Z cuda-nvprune conda-forge/linux-64::cuda-nvprune-12.8.55-hbd13f7d_0 2025-05-07T19:45:51.3464722Z cuda-nvrtc conda-forge/linux-64::cuda-nvrtc-12.8.61-hbd13f7d_0 2025-05-07T19:45:51.3465267Z cuda-nvrtc-dev conda-forge/linux-64::cuda-nvrtc-dev-12.8.61-h5888daf_0 2025-05-07T19:45:51.3465784Z cuda-nvtx conda-forge/linux-64::cuda-nvtx-12.8.55-hbd13f7d_0 2025-05-07T19:45:51.3466346Z cuda-nvvm-dev_lin~ conda-forge/noarch::cuda-nvvm-dev_linux-64-12.8.61-ha770c72_1 2025-05-07T19:45:51.3466954Z cuda-nvvm-impl conda-forge/linux-64::cuda-nvvm-impl-12.8.61-he02047a_1 2025-05-07T19:45:51.3467525Z cuda-nvvm-tools conda-forge/linux-64::cuda-nvvm-tools-12.8.61-he02047a_1 2025-05-07T19:45:51.3468077Z cuda-nvvp conda-forge/linux-64::cuda-nvvp-12.8.57-hbd13f7d_0 2025-05-07T19:45:51.3468582Z cuda-opencl conda-forge/linux-64::cuda-opencl-12.8.55-hbd13f7d_0 2025-05-07T19:45:51.3469153Z cuda-opencl-dev conda-forge/linux-64::cuda-opencl-dev-12.8.55-h5888daf_0 2025-05-07T19:45:51.3469765Z cuda-profiler-api conda-forge/linux-64::cuda-profiler-api-12.8.55-h7938cbb_0 2025-05-07T19:45:51.3470523Z cuda-runtime conda-forge/noarch::cuda-runtime-12.8.0-ha804496_0 2025-05-07T19:45:51.3471101Z cuda-sanitizer-api conda-forge/linux-64::cuda-sanitizer-api-12.8.55-hbd13f7d_0 2025-05-07T19:45:51.3471649Z cuda-toolkit conda-forge/noarch::cuda-toolkit-12.8.0-ha804496_0 2025-05-07T19:45:51.3472160Z cuda-tools conda-forge/linux-64::cuda-tools-12.8.0-ha770c72_0 2025-05-07T19:45:51.3472673Z cuda-version conda-forge/noarch::cuda-version-12.8-h5d125a7_3 2025-05-07T19:45:51.3473208Z cuda-visual-tools conda-forge/linux-64::cuda-visual-tools-12.8.0-ha770c72_0 2025-05-07T19:45:51.3473778Z cxx-compiler conda-forge/linux-64::cxx-compiler-1.5.2-hf52228f_0 2025-05-07T19:45:51.3474317Z dbus conda-forge/linux-64::dbus-1.13.6-h5008d03_3 2025-05-07T19:45:51.3474747Z gcc conda-forge/linux-64::gcc-11.4.0-h602e360_13 2025-05-07T19:45:51.3475215Z gds-tools conda-forge/linux-64::gds-tools-1.13.0.11-h5888daf_0 2025-05-07T19:45:51.3475652Z gmp conda-forge/linux-64::gmp-6.3.0-hac33072_2 2025-05-07T19:45:51.3476071Z gxx conda-forge/linux-64::gxx-11.4.0-h602e360_13 2025-05-07T19:45:51.3476482Z libcap conda-forge/linux-64::libcap-2.75-h39aace5_0 2025-05-07T19:45:51.3476971Z libcublas conda-forge/linux-64::libcublas-12.8.3.14-h9ab20c4_0 2025-05-07T19:45:51.3477519Z libcublas-dev conda-forge/linux-64::libcublas-dev-12.8.3.14-h9ab20c4_0 2025-05-07T19:45:51.3478034Z libcufft conda-forge/linux-64::libcufft-11.3.3.41-hbd13f7d_0 2025-05-07T19:45:51.3478563Z libcufft-dev conda-forge/linux-64::libcufft-dev-11.3.3.41-h5888daf_0 2025-05-07T19:45:51.3479076Z libcufile conda-forge/linux-64::libcufile-1.13.0.11-h12f29b5_0 2025-05-07T19:45:51.3479630Z libcufile-dev conda-forge/linux-64::libcufile-dev-1.13.0.11-h5888daf_0 2025-05-07T19:45:51.3480176Z libcurand conda-forge/linux-64::libcurand-10.3.9.55-hbd13f7d_0 2025-05-07T19:45:51.3480699Z libcurand-dev conda-forge/linux-64::libcurand-dev-10.3.9.55-h5888daf_0 2025-05-07T19:45:51.3481259Z libcusolver conda-forge/linux-64::libcusolver-11.7.2.55-h9ab20c4_0 2025-05-07T19:45:51.3481812Z libcusolver-dev conda-forge/linux-64::libcusolver-dev-11.7.2.55-h9ab20c4_0 2025-05-07T19:45:51.3482388Z libcusparse conda-forge/linux-64::libcusparse-12.5.7.53-hbd13f7d_0 2025-05-07T19:45:51.3482962Z libcusparse-dev conda-forge/linux-64::libcusparse-dev-12.5.7.53-h5888daf_0 2025-05-07T19:45:51.3483504Z libgcrypt-lib conda-forge/linux-64::libgcrypt-lib-1.11.0-hb9d3cd8_2 2025-05-07T19:45:51.3484004Z libglvnd conda-forge/linux-64::libglvnd-1.7.0-ha4b6fd6_2 2025-05-07T19:45:51.3484469Z libgpg-error conda-forge/linux-64::libgpg-error-1.55-h3f2d84a_0 2025-05-07T19:45:51.3485122Z libnl conda-forge/linux-64::libnl-3.11.0-hb9d3cd8_0 2025-05-07T19:45:51.3485580Z libnpp conda-forge/linux-64::libnpp-12.3.3.65-hbd13f7d_0 2025-05-07T19:45:51.3486063Z libnpp-dev conda-forge/linux-64::libnpp-dev-12.3.3.65-h5888daf_0 2025-05-07T19:45:51.3486558Z libnuma conda-forge/linux-64::libnuma-2.0.18-h4ab18f5_2 2025-05-07T19:45:51.3487037Z libnvfatbin conda-forge/linux-64::libnvfatbin-12.8.55-hbd13f7d_0 2025-05-07T19:45:51.3487589Z libnvfatbin-dev conda-forge/linux-64::libnvfatbin-dev-12.8.55-h5888daf_0 2025-05-07T19:45:51.3488149Z libnvjitlink conda-forge/linux-64::libnvjitlink-12.8.61-hbd13f7d_0 2025-05-07T19:45:51.3488700Z libnvjitlink-dev conda-forge/linux-64::libnvjitlink-dev-12.8.61-h5888daf_0 2025-05-07T19:45:51.3489372Z libnvjpeg conda-forge/linux-64::libnvjpeg-12.3.5.57-h97fd463_0 2025-05-07T19:45:51.3489877Z libnvjpeg-dev conda-forge/linux-64::libnvjpeg-dev-12.3.5.57-ha770c72_0 2025-05-07T19:45:51.3490378Z libopengl conda-forge/linux-64::libopengl-1.7.0-ha4b6fd6_2 2025-05-07T19:45:51.3490950Z libsystemd0 conda-forge/linux-64::libsystemd0-257.4-h4e0b6ca_1 2025-05-07T19:45:51.3491413Z libudev1 conda-forge/linux-64::libudev1-257.4-hbe16f8c_1 2025-05-07T19:45:51.3491912Z libxkbcommon conda-forge/linux-64::libxkbcommon-1.9.2-h65c71a3_0 2025-05-07T19:45:51.3492401Z libxkbfile conda-forge/linux-64::libxkbfile-1.1.0-h166bdaf_1 2025-05-07T19:45:51.3492884Z libxml2 conda-forge/linux-64::libxml2-2.13.8-h4bc477f_0 2025-05-07T19:45:51.3493327Z lz4-c conda-forge/linux-64::lz4-c-1.10.0-h5888daf_1 2025-05-07T19:45:51.3513452Z nsight-compute conda-forge/linux-64::nsight-compute-2025.1.0.14-hb5ebaad_0 2025-05-07T19:45:51.3514016Z nspr conda-forge/linux-64::nspr-4.36-h5888daf_0 2025-05-07T19:45:51.3514694Z nss conda-forge/linux-64::nss-3.111-h159eef7_0 2025-05-07T19:45:51.3515147Z ocl-icd conda-forge/linux-64::ocl-icd-2.3.3-hb9d3cd8_0 2025-05-07T19:45:51.3515673Z opencl-headers conda-forge/linux-64::opencl-headers-2024.10.24-h5888daf_0 2025-05-07T19:45:51.3516240Z rdma-core conda-forge/linux-64::rdma-core-57.0-h5888daf_0 2025-05-07T19:45:51.3516748Z wayland conda-forge/linux-64::wayland-1.23.1-h3e06ad9_0 2025-05-07T19:45:51.3517220Z xcb-util conda-forge/linux-64::xcb-util-0.4.1-hb711507_2 2025-05-07T19:45:51.3517751Z xcb-util-cursor conda-forge/linux-64::xcb-util-cursor-0.1.5-hb9d3cd8_0 2025-05-07T19:45:51.3518312Z xcb-util-image conda-forge/linux-64::xcb-util-image-0.4.0-hb711507_2 2025-05-07T19:45:51.3518892Z xcb-util-keysyms conda-forge/linux-64::xcb-util-keysyms-0.4.1-hb711507_0 2025-05-07T19:45:51.3519500Z xcb-util-renderut~ conda-forge/linux-64::xcb-util-renderutil-0.3.10-hb711507_0 2025-05-07T19:45:51.3520092Z xcb-util-wm conda-forge/linux-64::xcb-util-wm-0.4.2-hb711507_0 2025-05-07T19:45:51.3520646Z xkeyboard-config conda-forge/linux-64::xkeyboard-config-2.44-hb9d3cd8_0 2025-05-07T19:45:51.3521252Z xorg-libxcomposite conda-forge/linux-64::xorg-libxcomposite-0.4.6-hb9d3cd8_2 2025-05-07T19:45:51.3521877Z xorg-libxdamage conda-forge/linux-64::xorg-libxdamage-1.1.6-hb9d3cd8_0 2025-05-07T19:45:51.3522218Z 2025-05-07T19:45:51.3522344Z The following packages will be UPDATED: 2025-05-07T19:45:51.3522579Z 2025-05-07T19:45:51.3522753Z libsqlite 3.46.0-hde9e2c9_0 --> 3.49.2-hee588c1_0 2025-05-07T19:45:51.3523202Z sqlite 3.46.0-h6d4b2fc_0 --> 3.49.2-h9eae976_0 2025-05-07T19:45:51.3523465Z 2025-05-07T19:45:51.3523503Z 2025-05-07T19:45:51.3523507Z 2025-05-07T19:45:51.3523654Z Downloading and Extracting Packages: ...working... 2025-05-07T19:45:51.3524062Z libcublas-12.8.3.14 | 460.2 MB | | 0% 2025-05-07T19:45:51.3524316Z 2025-05-07T19:45:51.3524672Z nsight-compute-2025. | 320.6 MB | | 0%  2025-05-07T19:45:51.3524932Z 2025-05-07T19:45:51.3524936Z 2025-05-07T19:45:51.3525175Z libcusparse-12.5.7.5 | 164.9 MB | | 0%  2025-05-07T19:45:51.3525462Z 2025-05-07T19:45:51.3525465Z 2025-05-07T19:45:51.3525469Z 2025-05-07T19:45:51.3525710Z libcusolver-11.7.2.5 | 156.9 MB | | 0%  2025-05-07T19:45:51.3525983Z 2025-05-07T19:45:51.3525987Z 2025-05-07T19:45:51.3525991Z 2025-05-07T19:45:51.3525994Z 2025-05-07T19:45:51.3536341Z libcufft-11.3.3.41 | 147.4 MB | | 0%  2025-05-07T19:45:51.3536635Z 2025-05-07T19:45:51.3536639Z 2025-05-07T19:45:51.3536643Z 2025-05-07T19:45:51.3536647Z 2025-05-07T19:45:51.3536650Z 2025-05-07T19:45:51.3536909Z libnpp-12.3.3.65 | 130.6 MB | | 0%  2025-05-07T19:45:51.3537184Z 2025-05-07T19:45:51.3537187Z 2025-05-07T19:45:51.3537191Z 2025-05-07T19:45:51.3537195Z 2025-05-07T19:45:51.3537205Z 2025-05-07T19:45:51.3537208Z 2025-05-07T19:45:51.3537467Z cuda-nsight-12.8.55 | 113.2 MB | | 0%  2025-05-07T19:45:51.3537776Z 2025-05-07T19:45:51.3537779Z 2025-05-07T19:45:51.3537915Z 2025-05-07T19:45:51.3537919Z 2025-05-07T19:45:51.3537923Z 2025-05-07T19:45:51.3537926Z 2025-05-07T19:45:51.3537930Z 2025-05-07T19:45:51.3538205Z cuda-nvvp-12.8.57 | 112.4 MB | | 0%  2025-05-07T19:45:51.3538505Z 2025-05-07T19:45:51.3538509Z 2025-05-07T19:45:51.3538512Z 2025-05-07T19:45:51.3538516Z 2025-05-07T19:45:51.3538519Z 2025-05-07T19:45:51.3538523Z 2025-05-07T19:45:51.3538526Z 2025-05-07T19:45:51.3538529Z 2025-05-07T19:45:51.3538784Z cuda-nvrtc-12.8.61 | 63.1 MB | | 0%  2025-05-07T19:45:51.3539094Z 2025-05-07T19:45:51.3539097Z 2025-05-07T19:45:51.3539101Z 2025-05-07T19:45:51.3539104Z 2025-05-07T19:45:51.3539108Z 2025-05-07T19:45:51.3539111Z 2025-05-07T19:45:51.3539114Z 2025-05-07T19:45:51.3539199Z 2025-05-07T19:45:51.3539203Z 2025-05-07T19:45:51.3539484Z libcurand-10.3.9.55 | 43.6 MB | | 0%  2025-05-07T19:45:51.3539795Z 2025-05-07T19:45:51.3539804Z 2025-05-07T19:45:51.3539808Z 2025-05-07T19:45:51.3539811Z 2025-05-07T19:45:51.3539814Z 2025-05-07T19:45:51.3539818Z 2025-05-07T19:45:51.3539821Z 2025-05-07T19:45:51.3539824Z 2025-05-07T19:45:51.3539828Z 2025-05-07T19:45:51.3539831Z 2025-05-07T19:45:51.3540377Z gds-tools-1.13.0.11 | 37.9 MB | | 0%  2025-05-07T19:45:51.3540690Z 2025-05-07T19:45:51.3540693Z 2025-05-07T19:45:51.3540696Z 2025-05-07T19:45:51.3540702Z 2025-05-07T19:45:51.3540705Z 2025-05-07T19:45:51.3540708Z 2025-05-07T19:45:51.3540712Z 2025-05-07T19:45:51.3540725Z 2025-05-07T19:45:51.3540729Z 2025-05-07T19:45:51.3540732Z 2025-05-07T19:45:51.3540736Z 2025-05-07T19:45:51.3542467Z libnvjitlink-12.8.61 | 28.7 MB | | 0%  2025-05-07T19:45:51.3542777Z 2025-05-07T19:45:51.3542780Z 2025-05-07T19:45:51.3542784Z 2025-05-07T19:45:51.3542787Z 2025-05-07T19:45:51.3542791Z 2025-05-07T19:45:51.3542794Z 2025-05-07T19:45:51.3542798Z 2025-05-07T19:45:51.3542805Z 2025-05-07T19:45:51.3542809Z 2025-05-07T19:45:51.3542814Z 2025-05-07T19:45:51.3542835Z 2025-05-07T19:45:51.3542838Z 2025-05-07T19:45:51.3543133Z cuda-nvcc-tools-12.8 | 24.5 MB | | 0%  2025-05-07T19:45:51.3543590Z 2025-05-07T19:45:51.3543593Z 2025-05-07T19:45:51.3543597Z 2025-05-07T19:45:51.3543601Z 2025-05-07T19:45:51.3543604Z 2025-05-07T19:45:51.3543608Z 2025-05-07T19:45:51.3543611Z 2025-05-07T19:45:51.3543614Z 2025-05-07T19:45:51.3543618Z 2025-05-07T19:45:51.3543621Z 2025-05-07T19:45:51.3543625Z 2025-05-07T19:45:51.3543646Z 2025-05-07T19:45:51.3543649Z 2025-05-07T19:45:51.3546306Z cuda-nvvm-tools-12.8 | 23.5 MB | | 0%  2025-05-07T19:45:51.3546625Z 2025-05-07T19:45:51.3546633Z 2025-05-07T19:45:51.3546637Z 2025-05-07T19:45:51.3546640Z 2025-05-07T19:45:51.3546644Z 2025-05-07T19:45:51.3546647Z 2025-05-07T19:45:51.3546682Z 2025-05-07T19:45:51.3546685Z 2025-05-07T19:45:51.3546693Z 2025-05-07T19:45:51.3546697Z 2025-05-07T19:45:51.3546701Z 2025-05-07T19:45:51.3546705Z 2025-05-07T19:45:51.3546708Z 2025-05-07T19:45:51.3546711Z 2025-05-07T19:45:51.3547282Z cuda-nvvm-impl-12.8. | 20.8 MB | | 0%  2025-05-07T19:45:51.3547621Z 2025-05-07T19:45:51.3547624Z 2025-05-07T19:45:51.3547628Z 2025-05-07T19:45:51.3547631Z 2025-05-07T19:45:51.3547635Z 2025-05-07T19:45:51.3547655Z 2025-05-07T19:45:51.3547658Z 2025-05-07T19:45:51.3547661Z 2025-05-07T19:45:51.3547665Z 2025-05-07T19:45:51.3547668Z 2025-05-07T19:45:51.3547671Z 2025-05-07T19:45:51.3547675Z 2025-05-07T19:45:51.3547678Z 2025-05-07T19:45:51.3547682Z 2025-05-07T19:45:51.3547686Z 2025-05-07T19:45:51.3558496Z cuda-nvcc-dev_linux- | 12.7 MB | | 0%  2025-05-07T19:45:51.3559494Z 2025-05-07T19:45:51.3559505Z 2025-05-07T19:45:51.3559515Z 2025-05-07T19:45:51.3559525Z 2025-05-07T19:45:51.3559536Z 2025-05-07T19:45:51.3559724Z 2025-05-07T19:45:51.3559735Z 2025-05-07T19:45:51.3559745Z 2025-05-07T19:45:51.3559755Z 2025-05-07T19:45:51.3559766Z 2025-05-07T19:45:51.3559776Z 2025-05-07T19:45:51.3559786Z 2025-05-07T19:45:51.3559829Z 2025-05-07T19:45:51.3559840Z 2025-05-07T19:45:51.3559851Z 2025-05-07T19:45:51.3559861Z 2025-05-07T19:45:51.3560794Z cuda-sanitizer-api-1 | 8.8 MB | | 0%  2025-05-07T19:45:51.3561780Z 2025-05-07T19:45:51.3561791Z 2025-05-07T19:45:51.3561801Z 2025-05-07T19:45:51.3561811Z 2025-05-07T19:45:51.3561821Z 2025-05-07T19:45:51.3561831Z 2025-05-07T19:45:51.3561864Z 2025-05-07T19:45:51.3561874Z 2025-05-07T19:45:51.3561884Z 2025-05-07T19:45:51.3561895Z 2025-05-07T19:45:51.3561905Z 2025-05-07T19:45:51.3562121Z 2025-05-07T19:45:51.3562126Z 2025-05-07T19:45:51.3562129Z 2025-05-07T19:45:51.3562133Z 2025-05-07T19:45:51.3562136Z 2025-05-07T19:45:51.3562140Z 2025-05-07T19:45:51.3562460Z cuda-nvdisasm-12.8.5 | 4.9 MB | | 0%  2025-05-07T19:45:51.3562813Z 2025-05-07T19:45:51.3562817Z 2025-05-07T19:45:51.3562821Z 2025-05-07T19:45:51.3562824Z 2025-05-07T19:45:51.3562828Z 2025-05-07T19:45:51.3562831Z 2025-05-07T19:45:51.3562835Z 2025-05-07T19:45:51.3562838Z 2025-05-07T19:45:51.3562841Z 2025-05-07T19:45:51.3562845Z 2025-05-07T19:45:51.3562868Z 2025-05-07T19:45:51.3562872Z 2025-05-07T19:45:51.3562875Z 2025-05-07T19:45:51.3562879Z 2025-05-07T19:45:51.3562882Z 2025-05-07T19:45:51.3562885Z 2025-05-07T19:45:51.3562889Z 2025-05-07T19:45:51.3562892Z 2025-05-07T19:45:51.3563225Z cuda-cupti-dev-12.8. | 4.0 MB | | 0%  2025-05-07T19:45:51.3563556Z 2025-05-07T19:45:51.3563564Z 2025-05-07T19:45:51.3563567Z 2025-05-07T19:45:51.3563570Z 2025-05-07T19:45:51.3563574Z 2025-05-07T19:45:51.3563577Z 2025-05-07T19:45:51.3563581Z 2025-05-07T19:45:51.3563584Z 2025-05-07T19:45:51.3563591Z 2025-05-07T19:45:51.3563595Z 2025-05-07T19:45:51.3563598Z 2025-05-07T19:45:51.3563620Z 2025-05-07T19:45:51.3563624Z 2025-05-07T19:45:51.3563627Z 2025-05-07T19:45:51.3563631Z 2025-05-07T19:45:51.3563634Z 2025-05-07T19:45:51.3563638Z 2025-05-07T19:45:51.3563641Z 2025-05-07T19:45:51.3563645Z 2025-05-07T19:45:51.4492112Z ... (more hidden) ... 2025-05-07T19:45:51.4504963Z libcublas-12.8.3.14 | 460.2 MB | | 0% 2025-05-07T19:45:51.4505724Z 2025-05-07T19:45:51.4505738Z 2025-05-07T19:45:51.4518045Z libcusparse-12.5.7.5 | 164.9 MB | 1 | 1%  2025-05-07T19:45:51.4518346Z 2025-05-07T19:45:51.4518349Z 2025-05-07T19:45:51.4518826Z 2025-05-07T19:45:51.4522052Z libcusolver-11.7.2.5 | 156.9 MB | 2 | 2%  2025-05-07T19:45:51.4522351Z 2025-05-07T19:45:51.4522355Z 2025-05-07T19:45:51.4522360Z 2025-05-07T19:45:51.4522370Z 2025-05-07T19:45:51.5491352Z libcufft-11.3.3.41 | 147.4 MB | 2 | 2%  2025-05-07T19:45:51.5505508Z libcublas-12.8.3.14 | 460.2 MB | 1 | 2% 2025-05-07T19:45:51.5505764Z 2025-05-07T19:45:51.5505778Z 2025-05-07T19:45:51.5519693Z libcusparse-12.5.7.5 | 164.9 MB | 3 | 4%  2025-05-07T19:45:51.5519980Z 2025-05-07T19:45:51.5519994Z 2025-05-07T19:45:51.5519998Z 2025-05-07T19:45:51.5522497Z libcusolver-11.7.2.5 | 156.9 MB | 7 | 7%  2025-05-07T19:45:51.5522770Z 2025-05-07T19:45:51.5522795Z 2025-05-07T19:45:51.5522799Z 2025-05-07T19:45:51.5522803Z 2025-05-07T19:45:51.5914495Z libcufft-11.3.3.41 | 147.4 MB | 7 | 8%  2025-05-07T19:45:51.5914796Z 2025-05-07T19:45:51.6491378Z nsight-compute-2025. | 320.6 MB | | 0%  2025-05-07T19:45:51.6511996Z libcublas-12.8.3.14 | 460.2 MB | 3 | 3% 2025-05-07T19:45:51.6512267Z 2025-05-07T19:45:51.6513471Z 2025-05-07T19:45:51.6521014Z libcusparse-12.5.7.5 | 164.9 MB | 6 | 7%  2025-05-07T19:45:51.6521293Z 2025-05-07T19:45:51.6521505Z 2025-05-07T19:45:51.6521510Z 2025-05-07T19:45:51.6523850Z libcusolver-11.7.2.5 | 156.9 MB | #1 | 11%  2025-05-07T19:45:51.6524125Z 2025-05-07T19:45:51.6524129Z 2025-05-07T19:45:51.6524133Z 2025-05-07T19:45:51.6524145Z 2025-05-07T19:45:51.6919676Z libcufft-11.3.3.41 | 147.4 MB | #2 | 12%  2025-05-07T19:45:51.6919984Z 2025-05-07T19:45:51.7493285Z nsight-compute-2025. | 320.6 MB | 1 | 2%  2025-05-07T19:45:51.7624976Z libcublas-12.8.3.14 | 460.2 MB | 4 | 4% 2025-05-07T19:45:51.7625275Z 2025-05-07T19:45:51.7625279Z 2025-05-07T19:45:51.7625283Z 2025-05-07T19:45:51.7642017Z libcusolver-11.7.2.5 | 156.9 MB | #5 | 15%  2025-05-07T19:45:51.7642902Z 2025-05-07T19:45:51.7643318Z 2025-05-07T19:45:51.7643333Z 2025-05-07T19:45:51.7643343Z 2025-05-07T19:45:51.7805586Z libcufft-11.3.3.41 | 147.4 MB | #6 | 16%  2025-05-07T19:45:51.7805991Z 2025-05-07T19:45:51.7805996Z 2025-05-07T19:45:51.7919373Z libcusparse-12.5.7.5 | 164.9 MB | 9 | 9%  2025-05-07T19:45:51.7919682Z 2025-05-07T19:45:51.8563180Z nsight-compute-2025. | 320.6 MB | 3 | 3%  2025-05-07T19:45:51.8746154Z libcublas-12.8.3.14 | 460.2 MB | 5 | 6% 2025-05-07T19:45:51.8746521Z 2025-05-07T19:45:51.8746526Z 2025-05-07T19:45:51.8746530Z 2025-05-07T19:45:51.8747488Z 2025-05-07T19:45:51.8806407Z libcufft-11.3.3.41 | 147.4 MB | ## | 20%  2025-05-07T19:45:51.8806707Z 2025-05-07T19:45:51.8806711Z 2025-05-07T19:45:51.8849294Z libcusparse-12.5.7.5 | 164.9 MB | #2 | 12%  2025-05-07T19:45:51.8849683Z 2025-05-07T19:45:51.8849689Z 2025-05-07T19:45:51.8849693Z 2025-05-07T19:45:51.8919440Z libcusolver-11.7.2.5 | 156.9 MB | #9 | 19%  2025-05-07T19:45:51.8920683Z 2025-05-07T19:45:51.9744120Z nsight-compute-2025. | 320.6 MB | 4 | 5%  2025-05-07T19:45:51.9809418Z libcublas-12.8.3.14 | 460.2 MB | 6 | 7% 2025-05-07T19:45:51.9809767Z 2025-05-07T19:45:51.9809773Z 2025-05-07T19:45:51.9828942Z libcusparse-12.5.7.5 | 164.9 MB | #5 | 15%  2025-05-07T19:45:51.9829257Z 2025-05-07T19:45:51.9829262Z 2025-05-07T19:45:51.9829265Z 2025-05-07T19:45:51.9829275Z 2025-05-07T19:45:51.9862354Z libcufft-11.3.3.41 | 147.4 MB | ##4 | 24%  2025-05-07T19:45:51.9862672Z 2025-05-07T19:45:51.9862676Z 2025-05-07T19:45:51.9862681Z 2025-05-07T19:45:52.0115785Z libcusolver-11.7.2.5 | 156.9 MB | ##2 | 23%  2025-05-07T19:45:52.0116134Z 2025-05-07T19:45:52.0746340Z nsight-compute-2025. | 320.6 MB | 6 | 6%  2025-05-07T19:45:52.0865224Z libcublas-12.8.3.14 | 460.2 MB | 8 | 8% 2025-05-07T19:45:52.0866032Z 2025-05-07T19:45:52.0866086Z 2025-05-07T19:45:52.0866099Z 2025-05-07T19:45:52.0881799Z libcusolver-11.7.2.5 | 156.9 MB | ##7 | 28%  2025-05-07T19:45:52.0882125Z 2025-05-07T19:45:52.0882337Z 2025-05-07T19:45:52.0882362Z 2025-05-07T19:45:52.0882457Z 2025-05-07T19:45:52.0917568Z libcufft-11.3.3.41 | 147.4 MB | ##8 | 28%  2025-05-07T19:45:52.0917899Z 2025-05-07T19:45:52.0917914Z 2025-05-07T19:45:52.1115685Z libcusparse-12.5.7.5 | 164.9 MB | #8 | 18%  2025-05-07T19:45:52.1116022Z 2025-05-07T19:45:52.1911588Z nsight-compute-2025. | 320.6 MB | 7 | 8%  2025-05-07T19:45:52.1920504Z libcublas-12.8.3.14 | 460.2 MB | 9 | 9% 2025-05-07T19:45:52.1921270Z 2025-05-07T19:45:52.1921621Z 2025-05-07T19:45:52.1963953Z libcusparse-12.5.7.5 | 164.9 MB | ## | 21%  2025-05-07T19:45:52.1964258Z 2025-05-07T19:45:52.1964266Z 2025-05-07T19:45:52.1964272Z 2025-05-07T19:45:52.1964275Z 2025-05-07T19:45:52.2055245Z libcufft-11.3.3.41 | 147.4 MB | ###1 | 32%  2025-05-07T19:45:52.2056145Z 2025-05-07T19:45:52.2056159Z 2025-05-07T19:45:52.2056171Z 2025-05-07T19:45:52.2116055Z libcusolver-11.7.2.5 | 156.9 MB | ###1 | 32%  2025-05-07T19:45:52.2116665Z 2025-05-07T19:45:52.2913816Z nsight-compute-2025. | 320.6 MB | 9 | 9%  2025-05-07T19:45:52.3008010Z libcublas-12.8.3.14 | 460.2 MB | # | 10% 2025-05-07T19:45:52.3008339Z 2025-05-07T19:45:52.3008399Z 2025-05-07T19:45:52.3008405Z 2025-05-07T19:45:52.3008459Z 2025-05-07T19:45:52.3105908Z libcufft-11.3.3.41 | 147.4 MB | ###5 | 36%  2025-05-07T19:45:52.3106231Z 2025-05-07T19:45:52.3106236Z 2025-05-07T19:45:52.3106241Z 2025-05-07T19:45:52.3113163Z libcusolver-11.7.2.5 | 156.9 MB | ###5 | 35%  2025-05-07T19:45:52.3113464Z 2025-05-07T19:45:52.3113469Z 2025-05-07T19:45:52.3332970Z libcusparse-12.5.7.5 | 164.9 MB | ##3 | 24%  2025-05-07T19:45:52.3333314Z 2025-05-07T19:45:52.4107451Z nsight-compute-2025. | 320.6 MB | # | 11%  2025-05-07T19:45:52.4108322Z 2025-05-07T19:45:52.4108352Z 2025-05-07T19:45:52.4108364Z 2025-05-07T19:45:52.4113503Z libcusolver-11.7.2.5 | 156.9 MB | ####4 | 44%  2025-05-07T19:45:52.4113828Z 2025-05-07T19:45:52.4114805Z 2025-05-07T19:45:52.4334491Z libcusparse-12.5.7.5 | 164.9 MB | ##7 | 27%  2025-05-07T19:45:52.4334837Z 2025-05-07T19:45:52.4348834Z nsight-compute-2025. | 320.6 MB | #3 | 13%  2025-05-07T19:45:52.5116296Z libcublas-12.8.3.14 | 460.2 MB | #1 | 11% 2025-05-07T19:45:52.5117086Z 2025-05-07T19:45:52.5117114Z 2025-05-07T19:45:52.5174337Z libcusparse-12.5.7.5 | 164.9 MB | ### | 31%  2025-05-07T19:45:52.5175217Z 2025-05-07T19:45:52.5175231Z 2025-05-07T19:45:52.5175242Z 2025-05-07T19:45:52.5175252Z 2025-05-07T19:45:52.5350661Z libcufft-11.3.3.41 | 147.4 MB | ###9 | 39%  2025-05-07T19:45:52.5353089Z 2025-05-07T19:45:52.5356389Z nsight-compute-2025. | 320.6 MB | #5 | 15%  2025-05-07T19:45:52.5844703Z libcublas-12.8.3.14 | 460.2 MB | #2 | 12% 2025-05-07T19:45:52.5845233Z 2025-05-07T19:45:52.5845254Z 2025-05-07T19:45:52.5845261Z 2025-05-07T19:45:52.6127383Z libcusolver-11.7.2.5 | 156.9 MB | ####9 | 49%  2025-05-07T19:45:52.6127780Z 2025-05-07T19:45:52.6127786Z 2025-05-07T19:45:52.6173166Z libcusparse-12.5.7.5 | 164.9 MB | ###3 | 34%  2025-05-07T19:45:52.6173515Z 2025-05-07T19:45:52.6173521Z 2025-05-07T19:45:52.6173526Z 2025-05-07T19:45:52.6173529Z 2025-05-07T19:45:52.6360610Z libcufft-11.3.3.41 | 147.4 MB | ####2 | 42%  2025-05-07T19:45:52.6486004Z libcublas-12.8.3.14 | 460.2 MB | #3 | 13% 2025-05-07T19:45:52.6486547Z 2025-05-07T19:45:52.7226953Z nsight-compute-2025. | 320.6 MB | #6 | 17%  2025-05-07T19:45:52.7227280Z 2025-05-07T19:45:52.7227285Z 2025-05-07T19:45:52.7253101Z libcusparse-12.5.7.5 | 164.9 MB | ###6 | 37%  2025-05-07T19:45:52.7253439Z 2025-05-07T19:45:52.7253445Z 2025-05-07T19:45:52.7253466Z 2025-05-07T19:45:52.7253472Z 2025-05-07T19:45:52.7362240Z libcufft-11.3.3.41 | 147.4 MB | ####5 | 46%  2025-05-07T19:45:52.7446972Z libcublas-12.8.3.14 | 460.2 MB | #4 | 15% 2025-05-07T19:45:52.7447515Z 2025-05-07T19:45:52.7447533Z 2025-05-07T19:45:52.7447537Z 2025-05-07T19:45:52.7593017Z libcusolver-11.7.2.5 | 156.9 MB | #####3 | 54%  2025-05-07T19:45:52.7594548Z 2025-05-07T19:45:52.8255638Z nsight-compute-2025. | 320.6 MB | #8 | 18%  2025-05-07T19:45:52.8255951Z 2025-05-07T19:45:52.8255955Z 2025-05-07T19:45:52.8255958Z 2025-05-07T19:45:52.8255962Z 2025-05-07T19:45:52.8258627Z libcufft-11.3.3.41 | 147.4 MB | ####8 | 49%  2025-05-07T19:45:52.8258934Z 2025-05-07T19:45:52.8258940Z 2025-05-07T19:45:52.8362505Z libcusparse-12.5.7.5 | 164.9 MB | ###9 | 40%  2025-05-07T19:45:52.8656251Z libcublas-12.8.3.14 | 460.2 MB | #5 | 16% 2025-05-07T19:45:52.8656546Z 2025-05-07T19:45:52.8794411Z nsight-compute-2025. | 320.6 MB | ## | 20%  2025-05-07T19:45:52.8794712Z 2025-05-07T19:45:52.8794717Z 2025-05-07T19:45:52.8794982Z 2025-05-07T19:45:52.9256593Z libcusolver-11.7.2.5 | 156.9 MB | #####8 | 58%  2025-05-07T19:45:52.9256908Z 2025-05-07T19:45:52.9256914Z 2025-05-07T19:45:52.9256918Z 2025-05-07T19:45:52.9256961Z 2025-05-07T19:45:52.9366686Z libcufft-11.3.3.41 | 147.4 MB | #####1 | 52%  2025-05-07T19:45:52.9708207Z libcublas-12.8.3.14 | 460.2 MB | #6 | 17% 2025-05-07T19:45:52.9708521Z 2025-05-07T19:45:52.9883641Z nsight-compute-2025. | 320.6 MB | ##1 | 22%  2025-05-07T19:45:52.9883967Z 2025-05-07T19:45:52.9883972Z 2025-05-07T19:45:52.9883977Z 2025-05-07T19:45:53.0092846Z libcusolver-11.7.2.5 | 156.9 MB | ######1 | 62%  2025-05-07T19:45:53.0093239Z 2025-05-07T19:45:53.0093335Z 2025-05-07T19:45:53.0259436Z libcusparse-12.5.7.5 | 164.9 MB | ####2 | 43%  2025-05-07T19:45:53.0259880Z 2025-05-07T19:45:53.0259885Z 2025-05-07T19:45:53.0259890Z 2025-05-07T19:45:53.0259895Z 2025-05-07T19:45:53.0580956Z libcufft-11.3.3.41 | 147.4 MB | #####5 | 56%  2025-05-07T19:45:53.0711280Z libcublas-12.8.3.14 | 460.2 MB | #7 | 18% 2025-05-07T19:45:53.0712103Z 2025-05-07T19:45:53.0930812Z nsight-compute-2025. | 320.6 MB | ##3 | 23%  2025-05-07T19:45:53.0931117Z 2025-05-07T19:45:53.0931296Z 2025-05-07T19:45:53.0931302Z 2025-05-07T19:45:53.1149457Z libcusolver-11.7.2.5 | 156.9 MB | ######5 | 65%  2025-05-07T19:45:53.1149784Z 2025-05-07T19:45:53.1149789Z 2025-05-07T19:45:53.1260522Z libcusparse-12.5.7.5 | 164.9 MB | ####5 | 45%  2025-05-07T19:45:53.1260838Z 2025-05-07T19:45:53.1260842Z 2025-05-07T19:45:53.1260848Z 2025-05-07T19:45:53.1260853Z 2025-05-07T19:45:53.1713055Z libcufft-11.3.3.41 | 147.4 MB | #####9 | 60%  2025-05-07T19:45:53.1713367Z 2025-05-07T19:45:53.1810125Z nsight-compute-2025. | 320.6 MB | ##5 | 25%  2025-05-07T19:45:53.1934922Z libcublas-12.8.3.14 | 460.2 MB | #8 | 19% 2025-05-07T19:45:53.1935697Z 2025-05-07T19:45:53.1935763Z 2025-05-07T19:45:53.1935776Z 2025-05-07T19:45:53.2150421Z libcusolver-11.7.2.5 | 156.9 MB | ######9 | 69%  2025-05-07T19:45:53.2150731Z 2025-05-07T19:45:53.2150736Z 2025-05-07T19:45:53.2271467Z libcusparse-12.5.7.5 | 164.9 MB | ####8 | 48%  2025-05-07T19:45:53.2271787Z 2025-05-07T19:45:53.2271793Z 2025-05-07T19:45:53.2271796Z 2025-05-07T19:45:53.2272159Z 2025-05-07T19:45:53.2804782Z libcufft-11.3.3.41 | 147.4 MB | ######3 | 63%  2025-05-07T19:45:53.2805132Z 2025-05-07T19:45:53.2808902Z nsight-compute-2025. | 320.6 MB | ##7 | 27%  2025-05-07T19:45:53.3070172Z libcublas-12.8.3.14 | 460.2 MB | #9 | 20% 2025-05-07T19:45:53.3070507Z 2025-05-07T19:45:53.3070515Z 2025-05-07T19:45:53.3070827Z 2025-05-07T19:45:53.3153883Z libcusolver-11.7.2.5 | 156.9 MB | #######2 | 73%  2025-05-07T19:45:53.3154236Z 2025-05-07T19:45:53.3154242Z 2025-05-07T19:45:53.3349268Z libcusparse-12.5.7.5 | 164.9 MB | #####1 | 51%  2025-05-07T19:45:53.3349631Z 2025-05-07T19:45:53.3349638Z 2025-05-07T19:45:53.3349644Z 2025-05-07T19:45:53.3349649Z 2025-05-07T19:45:53.3828067Z libcufft-11.3.3.41 | 147.4 MB | ######6 | 67%  2025-05-07T19:45:53.3943595Z libcublas-12.8.3.14 | 460.2 MB | ## | 21% 2025-05-07T19:45:53.3943976Z 2025-05-07T19:45:53.4154253Z nsight-compute-2025. | 320.6 MB | ##8 | 29%  2025-05-07T19:45:53.4154552Z 2025-05-07T19:45:53.4154557Z 2025-05-07T19:45:53.4353126Z libcusparse-12.5.7.5 | 164.9 MB | #####3 | 54%  2025-05-07T19:45:53.4353455Z 2025-05-07T19:45:53.4353465Z 2025-05-07T19:45:53.4353470Z 2025-05-07T19:45:53.4456357Z libcusolver-11.7.2.5 | 156.9 MB | #######6 | 76%  2025-05-07T19:45:53.4456773Z 2025-05-07T19:45:53.4456915Z 2025-05-07T19:45:53.4456963Z 2025-05-07T19:45:53.4457000Z 2025-05-07T19:45:53.4831439Z libcufft-11.3.3.41 | 147.4 MB | ####### | 70%  2025-05-07T19:45:53.4972697Z libcublas-12.8.3.14 | 460.2 MB | ##1 | 22% 2025-05-07T19:45:53.4973367Z 2025-05-07T19:45:53.5155195Z nsight-compute-2025. | 320.6 MB | ### | 30%  2025-05-07T19:45:53.5155490Z 2025-05-07T19:45:53.5155497Z 2025-05-07T19:45:53.5441425Z libcusparse-12.5.7.5 | 164.9 MB | #####6 | 57%  2025-05-07T19:45:53.5442072Z 2025-05-07T19:45:53.5442095Z 2025-05-07T19:45:53.5442108Z 2025-05-07T19:45:53.5532350Z libcusolver-11.7.2.5 | 156.9 MB | #######9 | 79%  2025-05-07T19:45:53.5532664Z 2025-05-07T19:45:53.5532807Z 2025-05-07T19:45:53.5532817Z 2025-05-07T19:45:53.5532824Z 2025-05-07T19:45:53.5838126Z libcufft-11.3.3.41 | 147.4 MB | #######3 | 73%  2025-05-07T19:45:53.5986020Z libcublas-12.8.3.14 | 460.2 MB | ##3 | 23% 2025-05-07T19:45:53.5986774Z 2025-05-07T19:45:53.6161227Z nsight-compute-2025. | 320.6 MB | ###1 | 32%  2025-05-07T19:45:53.6162098Z 2025-05-07T19:45:53.6162112Z 2025-05-07T19:45:53.6517164Z libcusparse-12.5.7.5 | 164.9 MB | #####9 | 60%  2025-05-07T19:45:53.6517584Z 2025-05-07T19:45:53.6517589Z 2025-05-07T19:45:53.6517593Z 2025-05-07T19:45:53.6536925Z libcusolver-11.7.2.5 | 156.9 MB | ########2 | 83%  2025-05-07T19:45:53.6537568Z 2025-05-07T19:45:53.6537597Z 2025-05-07T19:45:53.6537611Z 2025-05-07T19:45:53.6537625Z 2025-05-07T19:45:53.6841093Z libcufft-11.3.3.41 | 147.4 MB | #######6 | 77%  2025-05-07T19:45:53.6987045Z libcublas-12.8.3.14 | 460.2 MB | ##4 | 24% 2025-05-07T19:45:53.6988310Z 2025-05-07T19:45:53.7517404Z nsight-compute-2025. | 320.6 MB | ###3 | 34%  2025-05-07T19:45:53.7517706Z 2025-05-07T19:45:53.7517752Z 2025-05-07T19:45:53.7517758Z 2025-05-07T19:45:53.7538077Z libcusolver-11.7.2.5 | 156.9 MB | ########6 | 86%  2025-05-07T19:45:53.7538412Z 2025-05-07T19:45:53.7538417Z 2025-05-07T19:45:53.7538421Z 2025-05-07T19:45:53.7538424Z 2025-05-07T19:45:53.7650477Z libcufft-11.3.3.41 | 147.4 MB | ######## | 80%  2025-05-07T19:45:53.7650802Z 2025-05-07T19:45:53.7651010Z 2025-05-07T19:45:53.7845245Z libcusparse-12.5.7.5 | 164.9 MB | ######2 | 63%  2025-05-07T19:45:53.7988519Z libcublas-12.8.3.14 | 460.2 MB | ##5 | 25% 2025-05-07T19:45:53.7988856Z 2025-05-07T19:45:53.8536310Z nsight-compute-2025. | 320.6 MB | ###5 | 35%  2025-05-07T19:45:53.8536625Z 2025-05-07T19:45:53.8536632Z 2025-05-07T19:45:53.8536637Z 2025-05-07T19:45:53.8536892Z libcusolver-11.7.2.5 | 156.9 MB | ########9 | 89%  2025-05-07T19:45:53.8537203Z 2025-05-07T19:45:53.8537206Z 2025-05-07T19:45:53.8537211Z 2025-05-07T19:45:53.8537215Z 2025-05-07T19:45:53.8652428Z libcufft-11.3.3.41 | 147.4 MB | ########3 | 84%  2025-05-07T19:45:53.8652782Z 2025-05-07T19:45:53.8652788Z 2025-05-07T19:45:53.8863111Z libcusparse-12.5.7.5 | 164.9 MB | ######5 | 65%  2025-05-07T19:45:53.9052253Z libcublas-12.8.3.14 | 460.2 MB | ##6 | 26% 2025-05-07T19:45:53.9052779Z 2025-05-07T19:45:53.9583709Z nsight-compute-2025. | 320.6 MB | ###6 | 37%  2025-05-07T19:45:53.9584066Z 2025-05-07T19:45:53.9584089Z 2025-05-07T19:45:53.9584094Z 2025-05-07T19:45:53.9584101Z 2025-05-07T19:45:53.9589708Z libcufft-11.3.3.41 | 147.4 MB | ########7 | 87%  2025-05-07T19:45:53.9590000Z 2025-05-07T19:45:53.9590004Z 2025-05-07T19:45:53.9590017Z 2025-05-07T19:45:53.9653390Z libcusolver-11.7.2.5 | 156.9 MB | #########2 | 93%  2025-05-07T19:45:53.9653700Z 2025-05-07T19:45:53.9653705Z 2025-05-07T19:45:53.9951921Z libcusparse-12.5.7.5 | 164.9 MB | ######8 | 68%  2025-05-07T19:45:54.0059929Z libcublas-12.8.3.14 | 460.2 MB | ##7 | 27% 2025-05-07T19:45:54.0060321Z 2025-05-07T19:45:54.0624800Z nsight-compute-2025. | 320.6 MB | ###8 | 39%  2025-05-07T19:45:54.0625136Z 2025-05-07T19:45:54.0625141Z 2025-05-07T19:45:54.0625145Z 2025-05-07T19:45:54.0655624Z libcusolver-11.7.2.5 | 156.9 MB | #########5 | 96%  2025-05-07T19:45:54.0656093Z 2025-05-07T19:45:54.0656353Z 2025-05-07T19:45:54.0712494Z libcusparse-12.5.7.5 | 164.9 MB | #######1 | 71%  2025-05-07T19:45:54.0713388Z 2025-05-07T19:45:54.0713404Z 2025-05-07T19:45:54.0713415Z 2025-05-07T19:45:54.0713426Z 2025-05-07T19:45:54.0988675Z libcufft-11.3.3.41 | 147.4 MB | ######### | 90%  2025-05-07T19:45:54.1145148Z libcublas-12.8.3.14 | 460.2 MB | ##8 | 28% 2025-05-07T19:45:54.1145434Z 2025-05-07T19:45:54.1655974Z nsight-compute-2025. | 320.6 MB | #### | 40%  2025-05-07T19:45:54.1656278Z 2025-05-07T19:45:54.1656284Z 2025-05-07T19:45:54.1735702Z libcusparse-12.5.7.5 | 164.9 MB | #######3 | 74%  2025-05-07T19:45:54.1736602Z 2025-05-07T19:45:54.1736616Z 2025-05-07T19:45:54.1736627Z 2025-05-07T19:45:54.1754027Z libcusolver-11.7.2.5 | 156.9 MB | #########8 | 99%  2025-05-07T19:45:54.1754862Z 2025-05-07T19:45:54.1754876Z 2025-05-07T19:45:54.1754887Z 2025-05-07T19:45:54.1754918Z 2025-05-07T19:45:54.2145641Z libcufft-11.3.3.41 | 147.4 MB | #########3 | 94%  2025-05-07T19:45:54.2145984Z 2025-05-07T19:45:54.2146223Z nsight-compute-2025. | 320.6 MB | ####1 | 42%  2025-05-07T19:45:54.2661548Z libcublas-12.8.3.14 | 460.2 MB | ##9 | 29% 2025-05-07T19:45:54.2662076Z 2025-05-07T19:45:54.2662091Z 2025-05-07T19:45:54.2754826Z libcusparse-12.5.7.5 | 164.9 MB | #######7 | 77%  2025-05-07T19:45:54.2755140Z 2025-05-07T19:45:54.2755146Z 2025-05-07T19:45:54.2755150Z 2025-05-07T19:45:54.2755155Z 2025-05-07T19:45:54.3145528Z libcufft-11.3.3.41 | 147.4 MB | #########8 | 99%  2025-05-07T19:45:54.3145845Z 2025-05-07T19:45:54.3412827Z nsight-compute-2025. | 320.6 MB | ####4 | 45%  2025-05-07T19:45:54.3662962Z libcublas-12.8.3.14 | 460.2 MB | ### | 30% 2025-05-07T19:45:54.3663281Z 2025-05-07T19:45:54.3663287Z 2025-05-07T19:45:54.4146156Z libcusparse-12.5.7.5 | 164.9 MB | ########3 | 83%  2025-05-07T19:45:54.4146466Z 2025-05-07T19:45:54.4414629Z nsight-compute-2025. | 320.6 MB | ####7 | 48%  2025-05-07T19:45:54.4726346Z libcublas-12.8.3.14 | 460.2 MB | ###2 | 32% 2025-05-07T19:45:54.4726631Z 2025-05-07T19:45:54.4726636Z 2025-05-07T19:45:54.5149778Z libcusparse-12.5.7.5 | 164.9 MB | ########9 | 89%  2025-05-07T19:45:54.5150089Z 2025-05-07T19:45:54.5415106Z nsight-compute-2025. | 320.6 MB | #####1 | 52%  2025-05-07T19:45:54.5727488Z libcublas-12.8.3.14 | 460.2 MB | ###3 | 34% 2025-05-07T19:45:54.5727825Z 2025-05-07T19:45:54.5727925Z 2025-05-07T19:45:54.6150579Z libcusparse-12.5.7.5 | 164.9 MB | #########4 | 94%  2025-05-07T19:45:54.6151092Z 2025-05-07T19:45:54.6416249Z nsight-compute-2025. | 320.6 MB | #####6 | 57%  2025-05-07T19:45:54.7150352Z libcublas-12.8.3.14 | 460.2 MB | ###4 | 35% 2025-05-07T19:45:54.7150645Z 2025-05-07T19:45:54.7295561Z nsight-compute-2025. | 320.6 MB | ###### | 61%  2025-05-07T19:45:54.7296041Z 2025-05-07T19:45:54.7296046Z 2025-05-07T19:45:54.7582740Z libcusparse-12.5.7.5 | 164.9 MB | #########8 | 98%  2025-05-07T19:45:54.8150876Z libcublas-12.8.3.14 | 460.2 MB | ###6 | 36% 2025-05-07T19:45:54.8151182Z 2025-05-07T19:45:54.8583247Z nsight-compute-2025. | 320.6 MB | ######4 | 65%  2025-05-07T19:45:54.9151630Z libcublas-12.8.3.14 | 460.2 MB | ###8 | 38% 2025-05-07T19:45:54.9151936Z 2025-05-07T19:45:54.9647403Z nsight-compute-2025. | 320.6 MB | ######9 | 70%  2025-05-07T19:45:55.0152082Z libcublas-12.8.3.14 | 460.2 MB | #### | 40% 2025-05-07T19:45:55.0152413Z 2025-05-07T19:45:55.1485217Z nsight-compute-2025. | 320.6 MB | #######5 | 76%  2025-05-07T19:45:55.1485528Z 2025-05-07T19:45:55.1485535Z 2025-05-07T19:45:55.1485539Z 2025-05-07T19:45:55.1485545Z 2025-05-07T19:45:55.1644465Z libcufft-11.3.3.41 | 147.4 MB | ########## | 100%  2025-05-07T19:45:55.1644827Z 2025-05-07T19:45:55.1644833Z 2025-05-07T19:45:55.1644838Z 2025-05-07T19:45:55.1691730Z libcusolver-11.7.2.5 | 156.9 MB | ########## | 100%  2025-05-07T19:45:55.1944806Z libcublas-12.8.3.14 | 460.2 MB | ####1 | 42% 2025-05-07T19:45:55.1945113Z 2025-05-07T19:45:55.1945118Z 2025-05-07T19:45:55.1945123Z 2025-05-07T19:45:55.1945126Z 2025-05-07T19:45:55.1945130Z 2025-05-07T19:45:55.2025990Z libnpp-12.3.3.65 | 130.6 MB | | 0%  2025-05-07T19:45:55.2026321Z 2025-05-07T19:45:55.2026327Z 2025-05-07T19:45:55.2026330Z 2025-05-07T19:45:55.2026334Z 2025-05-07T19:45:55.2026339Z 2025-05-07T19:45:55.2026342Z 2025-05-07T19:45:55.2768471Z cuda-nsight-12.8.55 | 113.2 MB | | 0%  2025-05-07T19:45:55.2946386Z libcublas-12.8.3.14 | 460.2 MB | ####2 | 43% 2025-05-07T19:45:55.2947161Z 2025-05-07T19:45:55.2947175Z 2025-05-07T19:45:55.2947614Z 2025-05-07T19:45:55.2947657Z 2025-05-07T19:45:55.2947669Z 2025-05-07T19:45:55.3028375Z libnpp-12.3.3.65 | 130.6 MB | 4 | 4%  2025-05-07T19:45:55.3028694Z 2025-05-07T19:45:55.3028699Z 2025-05-07T19:45:55.3028720Z 2025-05-07T19:45:55.3028723Z 2025-05-07T19:45:55.3028727Z 2025-05-07T19:45:55.3028731Z 2025-05-07T19:45:55.3869775Z cuda-nsight-12.8.55 | 113.2 MB | 4 | 4%  2025-05-07T19:45:55.3946932Z libcublas-12.8.3.14 | 460.2 MB | ####4 | 44% 2025-05-07T19:45:55.3947257Z 2025-05-07T19:45:55.3947263Z 2025-05-07T19:45:55.3947269Z 2025-05-07T19:45:55.3947275Z 2025-05-07T19:45:55.3947281Z 2025-05-07T19:45:55.4030476Z libnpp-12.3.3.65 | 130.6 MB | 8 | 8%  2025-05-07T19:45:55.4030795Z 2025-05-07T19:45:55.4030828Z 2025-05-07T19:45:55.4030834Z 2025-05-07T19:45:55.4030840Z 2025-05-07T19:45:55.4030846Z 2025-05-07T19:45:55.4030851Z 2025-05-07T19:45:55.4347850Z cuda-nsight-12.8.55 | 113.2 MB | 9 | 9%  2025-05-07T19:45:55.4348197Z 2025-05-07T19:45:55.4948853Z nsight-compute-2025. | 320.6 MB | ######## | 81%  2025-05-07T19:45:55.4949177Z 2025-05-07T19:45:55.4949182Z 2025-05-07T19:45:55.4949223Z 2025-05-07T19:45:55.4949228Z 2025-05-07T19:45:55.4949233Z 2025-05-07T19:45:55.4985661Z libnpp-12.3.3.65 | 130.6 MB | #2 | 13%  2025-05-07T19:45:55.5031445Z libcublas-12.8.3.14 | 460.2 MB | ####5 | 45% 2025-05-07T19:45:55.5031768Z 2025-05-07T19:45:55.5031774Z 2025-05-07T19:45:55.5031778Z 2025-05-07T19:45:55.5031783Z 2025-05-07T19:45:55.5031788Z 2025-05-07T19:45:55.5031794Z 2025-05-07T19:45:55.5952005Z cuda-nsight-12.8.55 | 113.2 MB | #3 | 14%  2025-05-07T19:45:55.5952374Z 2025-05-07T19:45:55.5952379Z 2025-05-07T19:45:55.5952383Z 2025-05-07T19:45:55.5952387Z 2025-05-07T19:45:55.5952391Z 2025-05-07T19:45:55.6037518Z libnpp-12.3.3.65 | 130.6 MB | #6 | 17%  2025-05-07T19:45:55.6037983Z 2025-05-07T19:45:55.6037992Z 2025-05-07T19:45:55.6037998Z 2025-05-07T19:45:55.6038005Z 2025-05-07T19:45:55.6038014Z 2025-05-07T19:45:55.6038023Z 2025-05-07T19:45:55.6125765Z cuda-nsight-12.8.55 | 113.2 MB | #8 | 18%  2025-05-07T19:45:55.6745053Z libcublas-12.8.3.14 | 460.2 MB | ####6 | 46% 2025-05-07T19:45:55.6745559Z 2025-05-07T19:45:55.6951005Z nsight-compute-2025. | 320.6 MB | ########4 | 84%  2025-05-07T19:45:55.6951318Z 2025-05-07T19:45:55.6951323Z 2025-05-07T19:45:55.6951328Z 2025-05-07T19:45:55.6951334Z 2025-05-07T19:45:55.6951352Z 2025-05-07T19:45:55.7042605Z libnpp-12.3.3.65 | 130.6 MB | ## | 21%  2025-05-07T19:45:55.7043074Z 2025-05-07T19:45:55.7043079Z 2025-05-07T19:45:55.7043083Z 2025-05-07T19:45:55.7043086Z 2025-05-07T19:45:55.7043090Z 2025-05-07T19:45:55.7043093Z 2025-05-07T19:45:55.7126694Z cuda-nsight-12.8.55 | 113.2 MB | ##2 | 23%  2025-05-07T19:45:55.7954217Z libcublas-12.8.3.14 | 460.2 MB | ####7 | 48% 2025-05-07T19:45:55.7954541Z 2025-05-07T19:45:55.7954546Z 2025-05-07T19:45:55.7954552Z 2025-05-07T19:45:55.7954556Z 2025-05-07T19:45:55.7954561Z 2025-05-07T19:45:55.8045103Z libnpp-12.3.3.65 | 130.6 MB | ##4 | 25%  2025-05-07T19:45:55.8045533Z 2025-05-07T19:45:55.8045537Z 2025-05-07T19:45:55.8045541Z 2025-05-07T19:45:55.8045545Z 2025-05-07T19:45:55.8045548Z 2025-05-07T19:45:55.8045552Z 2025-05-07T19:45:55.8564907Z cuda-nsight-12.8.55 | 113.2 MB | ##7 | 28%  2025-05-07T19:45:55.8565320Z 2025-05-07T19:45:55.8783908Z nsight-compute-2025. | 320.6 MB | ########7 | 88%  2025-05-07T19:45:55.8953856Z libcublas-12.8.3.14 | 460.2 MB | ####8 | 49% 2025-05-07T19:45:55.8954169Z 2025-05-07T19:45:55.8954175Z 2025-05-07T19:45:55.8954181Z 2025-05-07T19:45:55.8954188Z 2025-05-07T19:45:55.8954193Z 2025-05-07T19:45:55.9046229Z libnpp-12.3.3.65 | 130.6 MB | ##9 | 30%  2025-05-07T19:45:55.9046931Z 2025-05-07T19:45:55.9046940Z 2025-05-07T19:45:55.9046945Z 2025-05-07T19:45:55.9046950Z 2025-05-07T19:45:55.9046955Z 2025-05-07T19:45:55.9046960Z 2025-05-07T19:45:55.9784609Z cuda-nsight-12.8.55 | 113.2 MB | ###2 | 33%  2025-05-07T19:45:56.0042697Z libcublas-12.8.3.14 | 460.2 MB | ####9 | 50% 2025-05-07T19:45:56.0043010Z 2025-05-07T19:45:56.0043149Z 2025-05-07T19:45:56.0043162Z 2025-05-07T19:45:56.0043172Z 2025-05-07T19:45:56.0043178Z 2025-05-07T19:45:56.0088726Z libnpp-12.3.3.65 | 130.6 MB | ###3 | 34%  2025-05-07T19:45:56.0089110Z 2025-05-07T19:45:56.0089117Z 2025-05-07T19:45:56.0089123Z 2025-05-07T19:45:56.0089129Z 2025-05-07T19:45:56.0089134Z 2025-05-07T19:45:56.0089139Z 2025-05-07T19:45:56.0118578Z cuda-nsight-12.8.55 | 113.2 MB | ###7 | 37%  2025-05-07T19:45:56.0118902Z 2025-05-07T19:45:56.0785836Z nsight-compute-2025. | 320.6 MB | ######### | 90%  2025-05-07T19:45:56.1093341Z libcublas-12.8.3.14 | 460.2 MB | ##### | 51% 2025-05-07T19:45:56.1093712Z 2025-05-07T19:45:56.1093809Z 2025-05-07T19:45:56.1093815Z 2025-05-07T19:45:56.1093864Z 2025-05-07T19:45:56.1093872Z 2025-05-07T19:45:56.1093981Z 2025-05-07T19:45:56.1183002Z cuda-nsight-12.8.55 | 113.2 MB | ####2 | 42%  2025-05-07T19:45:56.1183538Z 2025-05-07T19:45:56.1183544Z 2025-05-07T19:45:56.1183549Z 2025-05-07T19:45:56.1183554Z 2025-05-07T19:45:56.1183560Z 2025-05-07T19:45:56.1604741Z libnpp-12.3.3.65 | 130.6 MB | ###8 | 38%  2025-05-07T19:45:56.1605102Z 2025-05-07T19:45:56.1786767Z nsight-compute-2025. | 320.6 MB | #########2 | 92%  2025-05-07T19:45:56.2094923Z libcublas-12.8.3.14 | 460.2 MB | #####2 | 52% 2025-05-07T19:45:56.2095734Z 2025-05-07T19:45:56.2095749Z 2025-05-07T19:45:56.2095761Z 2025-05-07T19:45:56.2095771Z 2025-05-07T19:45:56.2095782Z 2025-05-07T19:45:56.2095794Z 2025-05-07T19:45:56.2185962Z cuda-nsight-12.8.55 | 113.2 MB | ####7 | 47%  2025-05-07T19:45:56.2186292Z 2025-05-07T19:45:56.2186297Z 2025-05-07T19:45:56.2186301Z 2025-05-07T19:45:56.2186306Z 2025-05-07T19:45:56.2186685Z 2025-05-07T19:45:56.2824763Z libnpp-12.3.3.65 | 130.6 MB | ####2 | 42%  2025-05-07T19:45:56.2896209Z libcublas-12.8.3.14 | 460.2 MB | #####3 | 53% 2025-05-07T19:45:56.2896528Z 2025-05-07T19:45:56.3096086Z nsight-compute-2025. | 320.6 MB | #########4 | 95%  2025-05-07T19:45:56.3096421Z 2025-05-07T19:45:56.3096428Z 2025-05-07T19:45:56.3096433Z 2025-05-07T19:45:56.3096441Z 2025-05-07T19:45:56.3096447Z 2025-05-07T19:45:56.3096452Z 2025-05-07T19:45:56.3186492Z cuda-nsight-12.8.55 | 113.2 MB | #####1 | 52%  2025-05-07T19:45:56.3186839Z 2025-05-07T19:45:56.3186845Z 2025-05-07T19:45:56.3186850Z 2025-05-07T19:45:56.3186853Z 2025-05-07T19:45:56.3186857Z 2025-05-07T19:45:56.4028782Z libnpp-12.3.3.65 | 130.6 MB | ####6 | 46%  2025-05-07T19:45:56.4029146Z 2025-05-07T19:45:56.4096206Z nsight-compute-2025. | 320.6 MB | #########6 | 97%  2025-05-07T19:45:56.4096612Z 2025-05-07T19:45:56.4096767Z 2025-05-07T19:45:56.4096775Z 2025-05-07T19:45:56.4097119Z 2025-05-07T19:45:56.4097327Z 2025-05-07T19:45:56.4097341Z 2025-05-07T19:45:56.4188143Z cuda-nsight-12.8.55 | 113.2 MB | #####7 | 57%  2025-05-07T19:45:56.4188467Z 2025-05-07T19:45:56.4188473Z 2025-05-07T19:45:56.4188477Z 2025-05-07T19:45:56.4188481Z 2025-05-07T19:45:56.4188491Z 2025-05-07T19:45:56.4362476Z libnpp-12.3.3.65 | 130.6 MB | ##### | 51%  2025-05-07T19:45:56.5183574Z libcublas-12.8.3.14 | 460.2 MB | #####4 | 54% 2025-05-07T19:45:56.5184243Z 2025-05-07T19:45:56.5184273Z 2025-05-07T19:45:56.5184285Z 2025-05-07T19:45:56.5184294Z 2025-05-07T19:45:56.5184307Z 2025-05-07T19:45:56.5184315Z 2025-05-07T19:45:56.5188029Z cuda-nsight-12.8.55 | 113.2 MB | ######1 | 62%  2025-05-07T19:45:56.5188852Z 2025-05-07T19:45:56.5266384Z nsight-compute-2025. | 320.6 MB | #########8 | 98%  2025-05-07T19:45:56.5266994Z 2025-05-07T19:45:56.5267004Z 2025-05-07T19:45:56.5267013Z 2025-05-07T19:45:56.5267058Z 2025-05-07T19:45:56.5267077Z 2025-05-07T19:45:56.5367018Z libnpp-12.3.3.65 | 130.6 MB | #####5 | 55%  2025-05-07T19:45:56.6253740Z libcublas-12.8.3.14 | 460.2 MB | #####5 | 55% 2025-05-07T19:45:56.6254215Z 2025-05-07T19:45:56.6254222Z 2025-05-07T19:45:56.6254227Z 2025-05-07T19:45:56.6254231Z 2025-05-07T19:45:56.6254237Z 2025-05-07T19:45:56.6254243Z 2025-05-07T19:45:56.6282257Z cuda-nsight-12.8.55 | 113.2 MB | ######6 | 67%  2025-05-07T19:45:56.6282680Z 2025-05-07T19:45:56.6282686Z 2025-05-07T19:45:56.6282691Z 2025-05-07T19:45:56.6282696Z 2025-05-07T19:45:56.6282700Z 2025-05-07T19:45:56.6364394Z libnpp-12.3.3.65 | 130.6 MB | #####9 | 59%  2025-05-07T19:45:56.7255433Z libcublas-12.8.3.14 | 460.2 MB | #####6 | 56% 2025-05-07T19:45:56.7255955Z 2025-05-07T19:45:56.7255960Z 2025-05-07T19:45:56.7255979Z 2025-05-07T19:45:56.7255984Z 2025-05-07T19:45:56.7255987Z 2025-05-07T19:45:56.7255991Z 2025-05-07T19:45:56.7282621Z cuda-nsight-12.8.55 | 113.2 MB | #######2 | 73%  2025-05-07T19:45:56.7282976Z 2025-05-07T19:45:56.7282981Z 2025-05-07T19:45:56.7282984Z 2025-05-07T19:45:56.7282989Z 2025-05-07T19:45:56.7283008Z 2025-05-07T19:45:56.7369271Z libnpp-12.3.3.65 | 130.6 MB | ######4 | 64%  2025-05-07T19:45:56.8026810Z libcublas-12.8.3.14 | 460.2 MB | #####7 | 58% 2025-05-07T19:45:56.8027165Z 2025-05-07T19:45:56.8027171Z 2025-05-07T19:45:56.8257254Z libcusparse-12.5.7.5 | 164.9 MB | ########## | 100%  2025-05-07T19:45:56.8257618Z 2025-05-07T19:45:56.8257625Z 2025-05-07T19:45:56.8257631Z 2025-05-07T19:45:56.8257639Z 2025-05-07T19:45:56.8257644Z 2025-05-07T19:45:56.8257649Z 2025-05-07T19:45:56.8283216Z cuda-nsight-12.8.55 | 113.2 MB | #######8 | 78%  2025-05-07T19:45:56.8283584Z 2025-05-07T19:45:56.8283589Z 2025-05-07T19:45:56.8283594Z 2025-05-07T19:45:56.8283599Z 2025-05-07T19:45:56.8283604Z 2025-05-07T19:45:56.8371468Z libnpp-12.3.3.65 | 130.6 MB | ######9 | 69%  2025-05-07T19:45:56.8496856Z libcublas-12.8.3.14 | 460.2 MB | #####8 | 59% 2025-05-07T19:45:56.8497149Z 2025-05-07T19:45:56.8497387Z 2025-05-07T19:45:56.8497403Z 2025-05-07T19:45:56.8497410Z 2025-05-07T19:45:56.8497415Z 2025-05-07T19:45:56.8497420Z 2025-05-07T19:45:56.8497426Z 2025-05-07T19:45:56.9258161Z cuda-nvvp-12.8.57 | 112.4 MB | | 0%  2025-05-07T19:45:56.9258494Z 2025-05-07T19:45:56.9258499Z 2025-05-07T19:45:56.9258503Z 2025-05-07T19:45:56.9258507Z 2025-05-07T19:45:56.9258511Z 2025-05-07T19:45:56.9258515Z 2025-05-07T19:45:56.9285970Z cuda-nsight-12.8.55 | 113.2 MB | ########4 | 84%  2025-05-07T19:45:56.9286892Z 2025-05-07T19:45:56.9286954Z 2025-05-07T19:45:56.9286966Z 2025-05-07T19:45:56.9286976Z 2025-05-07T19:45:56.9286988Z 2025-05-07T19:45:56.9496697Z libnpp-12.3.3.65 | 130.6 MB | #######4 | 74%  2025-05-07T19:45:56.9497403Z 2025-05-07T19:45:56.9497409Z 2025-05-07T19:45:56.9497416Z 2025-05-07T19:45:56.9497422Z 2025-05-07T19:45:56.9497428Z 2025-05-07T19:45:56.9497434Z 2025-05-07T19:45:56.9497438Z 2025-05-07T19:45:56.9656507Z cuda-nvvp-12.8.57 | 112.4 MB | 3 | 3%  2025-05-07T19:45:57.0291672Z libcublas-12.8.3.14 | 460.2 MB | #####9 | 60% 2025-05-07T19:45:57.0292058Z 2025-05-07T19:45:57.0292153Z 2025-05-07T19:45:57.0292158Z 2025-05-07T19:45:57.0292268Z 2025-05-07T19:45:57.0292277Z 2025-05-07T19:45:57.0498464Z libnpp-12.3.3.65 | 130.6 MB | #######9 | 79%  2025-05-07T19:45:57.0498840Z 2025-05-07T19:45:57.0498846Z 2025-05-07T19:45:57.0498851Z 2025-05-07T19:45:57.0498856Z 2025-05-07T19:45:57.0498861Z 2025-05-07T19:45:57.0498867Z 2025-05-07T19:45:57.0499177Z 2025-05-07T19:45:57.1294945Z cuda-nvvp-12.8.57 | 112.4 MB | 8 | 9%  2025-05-07T19:45:57.1295291Z 2025-05-07T19:45:57.1295310Z 2025-05-07T19:45:57.1295316Z 2025-05-07T19:45:57.1295355Z 2025-05-07T19:45:57.1295359Z 2025-05-07T19:45:57.1498832Z libnpp-12.3.3.65 | 130.6 MB | ########6 | 86%  2025-05-07T19:45:57.1499266Z 2025-05-07T19:45:57.1499273Z 2025-05-07T19:45:57.1499276Z 2025-05-07T19:45:57.1499281Z 2025-05-07T19:45:57.1499285Z 2025-05-07T19:45:57.1499288Z 2025-05-07T19:45:57.1499291Z 2025-05-07T19:45:57.1706449Z cuda-nvvp-12.8.57 | 112.4 MB | #5 | 16%  2025-05-07T19:45:57.1706821Z 2025-05-07T19:45:57.1706831Z 2025-05-07T19:45:57.1706840Z 2025-05-07T19:45:57.1706864Z 2025-05-07T19:45:57.1706870Z 2025-05-07T19:45:57.1706879Z 2025-05-07T19:45:57.2590388Z cuda-nsight-12.8.55 | 113.2 MB | ########9 | 89%  2025-05-07T19:45:57.2590710Z 2025-05-07T19:45:57.2590735Z 2025-05-07T19:45:57.2590749Z 2025-05-07T19:45:57.2590992Z 2025-05-07T19:45:57.2591576Z 2025-05-07T19:45:57.2605860Z libnpp-12.3.3.65 | 130.6 MB | #########1 | 92%  2025-05-07T19:45:57.2606285Z 2025-05-07T19:45:57.2606292Z 2025-05-07T19:45:57.2606296Z 2025-05-07T19:45:57.2606300Z 2025-05-07T19:45:57.2606303Z 2025-05-07T19:45:57.2606308Z 2025-05-07T19:45:57.2607520Z 2025-05-07T19:45:57.2634300Z cuda-nvvp-12.8.57 | 112.4 MB | ## | 20%  2025-05-07T19:45:57.2707836Z libcublas-12.8.3.14 | 460.2 MB | ######1 | 61% 2025-05-07T19:45:57.2708219Z 2025-05-07T19:45:57.2708248Z 2025-05-07T19:45:57.2708253Z 2025-05-07T19:45:57.2708258Z 2025-05-07T19:45:57.2708262Z 2025-05-07T19:45:57.2710111Z 2025-05-07T19:45:57.3635716Z cuda-nsight-12.8.55 | 113.2 MB | #########4 | 94%  2025-05-07T19:45:57.3729525Z libcublas-12.8.3.14 | 460.2 MB | ######1 | 62% 2025-05-07T19:45:57.3730352Z 2025-05-07T19:45:57.3730368Z 2025-05-07T19:45:57.3730417Z 2025-05-07T19:45:57.3730432Z 2025-05-07T19:45:57.3730443Z 2025-05-07T19:45:57.3730455Z 2025-05-07T19:45:57.3777901Z cuda-nsight-12.8.55 | 113.2 MB | #########8 | 99%  2025-05-07T19:45:57.3778269Z 2025-05-07T19:45:57.3778547Z 2025-05-07T19:45:57.3778579Z 2025-05-07T19:45:57.3778605Z 2025-05-07T19:45:57.3778625Z 2025-05-07T19:45:57.3778651Z 2025-05-07T19:45:57.3778678Z 2025-05-07T19:45:57.4066622Z cuda-nvvp-12.8.57 | 112.4 MB | ##5 | 25%  2025-05-07T19:45:57.4067011Z 2025-05-07T19:45:57.4067016Z 2025-05-07T19:45:57.4067021Z 2025-05-07T19:45:57.4067025Z 2025-05-07T19:45:57.4067030Z 2025-05-07T19:45:57.4636316Z libnpp-12.3.3.65 | 130.6 MB | #########6 | 97%  2025-05-07T19:45:57.4780849Z libcublas-12.8.3.14 | 460.2 MB | ######3 | 63% 2025-05-07T19:45:57.4781381Z 2025-05-07T19:45:57.4781388Z 2025-05-07T19:45:57.4781393Z 2025-05-07T19:45:57.4781400Z 2025-05-07T19:45:57.4781406Z 2025-05-07T19:45:57.4781445Z 2025-05-07T19:45:57.4781450Z 2025-05-07T19:45:57.5638185Z cuda-nvvp-12.8.57 | 112.4 MB | ##9 | 30%  2025-05-07T19:45:57.5780623Z libcublas-12.8.3.14 | 460.2 MB | ######5 | 65% 2025-05-07T19:45:57.5781437Z 2025-05-07T19:45:57.5781442Z 2025-05-07T19:45:57.5781446Z 2025-05-07T19:45:57.5781450Z 2025-05-07T19:45:57.5781455Z 2025-05-07T19:45:57.5781460Z 2025-05-07T19:45:57.5781463Z 2025-05-07T19:45:57.6722753Z cuda-nvvp-12.8.57 | 112.4 MB | ###5 | 36%  2025-05-07T19:45:57.7268763Z libcublas-12.8.3.14 | 460.2 MB | ######6 | 66% 2025-05-07T19:45:57.7269107Z 2025-05-07T19:45:57.7269114Z 2025-05-07T19:45:57.7269119Z 2025-05-07T19:45:57.7269145Z 2025-05-07T19:45:57.7269150Z 2025-05-07T19:45:57.7269153Z 2025-05-07T19:45:57.7269156Z 2025-05-07T19:45:57.7721066Z cuda-nvvp-12.8.57 | 112.4 MB | #### | 41%  2025-05-07T19:45:57.8402783Z libcublas-12.8.3.14 | 460.2 MB | ######8 | 68% 2025-05-07T19:45:57.8403172Z 2025-05-07T19:45:57.8403178Z 2025-05-07T19:45:57.8403183Z 2025-05-07T19:45:57.8403187Z 2025-05-07T19:45:57.8403192Z 2025-05-07T19:45:57.8403197Z 2025-05-07T19:45:57.8403202Z 2025-05-07T19:45:57.8723104Z cuda-nvvp-12.8.57 | 112.4 MB | ####5 | 45%  2025-05-07T19:45:57.9402588Z libcublas-12.8.3.14 | 460.2 MB | ####### | 70% 2025-05-07T19:45:57.9402932Z 2025-05-07T19:45:57.9402942Z 2025-05-07T19:45:57.9402948Z 2025-05-07T19:45:57.9402954Z 2025-05-07T19:45:57.9402960Z 2025-05-07T19:45:57.9402985Z 2025-05-07T19:45:57.9402991Z 2025-05-07T19:45:57.9725411Z cuda-nvvp-12.8.57 | 112.4 MB | ##### | 51%  2025-05-07T19:45:58.0403231Z libcublas-12.8.3.14 | 460.2 MB | #######2 | 72% 2025-05-07T19:45:58.0403591Z 2025-05-07T19:45:58.0403597Z 2025-05-07T19:45:58.0403603Z 2025-05-07T19:45:58.0403608Z 2025-05-07T19:45:58.0403612Z 2025-05-07T19:45:58.0403618Z 2025-05-07T19:45:58.0403622Z 2025-05-07T19:45:58.0726192Z cuda-nvvp-12.8.57 | 112.4 MB | #####5 | 56%  2025-05-07T19:45:58.1406103Z libcublas-12.8.3.14 | 460.2 MB | #######4 | 74% 2025-05-07T19:45:58.1406441Z 2025-05-07T19:45:58.1406495Z 2025-05-07T19:45:58.1406523Z 2025-05-07T19:45:58.1406527Z 2025-05-07T19:45:58.1406530Z 2025-05-07T19:45:58.1406534Z 2025-05-07T19:45:58.1406537Z 2025-05-07T19:45:58.1728222Z cuda-nvvp-12.8.57 | 112.4 MB | ######1 | 62%  2025-05-07T19:45:58.2406375Z libcublas-12.8.3.14 | 460.2 MB | #######5 | 76% 2025-05-07T19:45:58.2406948Z 2025-05-07T19:45:58.2406959Z 2025-05-07T19:45:58.2406969Z 2025-05-07T19:45:58.2406978Z 2025-05-07T19:45:58.2407014Z 2025-05-07T19:45:58.2407020Z 2025-05-07T19:45:58.2407026Z 2025-05-07T19:45:58.2743878Z cuda-nvvp-12.8.57 | 112.4 MB | ######7 | 67%  2025-05-07T19:45:58.3409190Z libcublas-12.8.3.14 | 460.2 MB | #######7 | 78% 2025-05-07T19:45:58.3409582Z 2025-05-07T19:45:58.3409591Z 2025-05-07T19:45:58.3409633Z 2025-05-07T19:45:58.3409636Z 2025-05-07T19:45:58.3409640Z 2025-05-07T19:45:58.3409645Z 2025-05-07T19:45:58.3409649Z 2025-05-07T19:45:58.3818115Z cuda-nvvp-12.8.57 | 112.4 MB | #######3 | 74%  2025-05-07T19:45:58.4409353Z libcublas-12.8.3.14 | 460.2 MB | #######9 | 79% 2025-05-07T19:45:58.4409880Z 2025-05-07T19:45:58.4409914Z 2025-05-07T19:45:58.4409920Z 2025-05-07T19:45:58.4409924Z 2025-05-07T19:45:58.4409929Z 2025-05-07T19:45:58.4409934Z 2025-05-07T19:45:58.4409938Z 2025-05-07T19:45:58.4946300Z cuda-nvvp-12.8.57 | 112.4 MB | ######## | 81%  2025-05-07T19:45:58.5409946Z libcublas-12.8.3.14 | 460.2 MB | ########1 | 81% 2025-05-07T19:45:58.5410323Z 2025-05-07T19:45:58.5410353Z 2025-05-07T19:45:58.5410358Z 2025-05-07T19:45:58.5410363Z 2025-05-07T19:45:58.5410366Z 2025-05-07T19:45:58.5410377Z 2025-05-07T19:45:58.5410381Z 2025-05-07T19:45:58.5947440Z cuda-nvvp-12.8.57 | 112.4 MB | ########7 | 88%  2025-05-07T19:45:58.6411197Z libcublas-12.8.3.14 | 460.2 MB | ########2 | 83% 2025-05-07T19:45:58.6411552Z 2025-05-07T19:45:58.6411577Z 2025-05-07T19:45:58.6411582Z 2025-05-07T19:45:58.6411868Z 2025-05-07T19:45:58.6411871Z 2025-05-07T19:45:58.6411875Z 2025-05-07T19:45:58.6411878Z 2025-05-07T19:45:58.6531451Z cuda-nvvp-12.8.57 | 112.4 MB | #########4 | 95%  2025-05-07T19:45:58.6531853Z 2025-05-07T19:45:58.6531858Z 2025-05-07T19:45:58.6531862Z 2025-05-07T19:45:58.6531866Z 2025-05-07T19:45:58.6946925Z libcufft-11.3.3.41 | 147.4 MB | ########## | 100%  2025-05-07T19:45:58.7946985Z libcublas-12.8.3.14 | 460.2 MB | ########4 | 85% 2025-05-07T19:45:58.8969331Z libcublas-12.8.3.14 | 460.2 MB | ########7 | 87% 2025-05-07T19:45:58.9106903Z libcublas-12.8.3.14 | 460.2 MB | ########9 | 89% 2025-05-07T19:45:58.9107395Z 2025-05-07T19:45:58.9107402Z 2025-05-07T19:45:58.9107412Z 2025-05-07T19:45:58.9107417Z 2025-05-07T19:45:58.9107687Z 2025-05-07T19:45:58.9107744Z 2025-05-07T19:45:58.9549757Z cuda-nsight-12.8.55 | 113.2 MB | ########## | 100%  2025-05-07T19:45:58.9550274Z 2025-05-07T19:45:58.9550285Z 2025-05-07T19:45:58.9550374Z 2025-05-07T19:45:58.9550383Z 2025-05-07T19:45:58.9550392Z 2025-05-07T19:45:58.9550401Z 2025-05-07T19:45:58.9550410Z 2025-05-07T19:45:58.9550420Z 2025-05-07T19:45:59.0020488Z cuda-nvrtc-12.8.61 | 63.1 MB | | 0%  2025-05-07T19:45:59.0551027Z libcublas-12.8.3.14 | 460.2 MB | #########1 | 91% 2025-05-07T19:45:59.0551420Z 2025-05-07T19:45:59.0551427Z 2025-05-07T19:45:59.0551431Z 2025-05-07T19:45:59.0551459Z 2025-05-07T19:45:59.0551463Z 2025-05-07T19:45:59.0551467Z 2025-05-07T19:45:59.0551471Z 2025-05-07T19:45:59.0551474Z 2025-05-07T19:45:59.1310386Z cuda-nvrtc-12.8.61 | 63.1 MB | #1 | 12%  2025-05-07T19:45:59.1551987Z libcublas-12.8.3.14 | 460.2 MB | #########3 | 93% 2025-05-07T19:45:59.1552435Z 2025-05-07T19:45:59.1552441Z 2025-05-07T19:45:59.1552445Z 2025-05-07T19:45:59.1552476Z 2025-05-07T19:45:59.1552481Z 2025-05-07T19:45:59.1552486Z 2025-05-07T19:45:59.1552492Z 2025-05-07T19:45:59.1552919Z 2025-05-07T19:45:59.2043218Z cuda-nvrtc-12.8.61 | 63.1 MB | ##3 | 24%  2025-05-07T19:45:59.2043611Z 2025-05-07T19:45:59.2043618Z 2025-05-07T19:45:59.2043624Z 2025-05-07T19:45:59.2043648Z 2025-05-07T19:45:59.2043651Z 2025-05-07T19:45:59.2386201Z libnpp-12.3.3.65 | 130.6 MB | ########## | 100%  2025-05-07T19:45:59.2386563Z 2025-05-07T19:45:59.2386569Z 2025-05-07T19:45:59.2386574Z 2025-05-07T19:45:59.2386592Z 2025-05-07T19:45:59.2386596Z 2025-05-07T19:45:59.2386601Z 2025-05-07T19:45:59.2386605Z 2025-05-07T19:45:59.2386610Z 2025-05-07T19:45:59.2386614Z 2025-05-07T19:45:59.2457928Z libcurand-10.3.9.55 | 43.6 MB | | 0%  2025-05-07T19:45:59.2557216Z libcublas-12.8.3.14 | 460.2 MB | #########5 | 95% 2025-05-07T19:45:59.2557770Z 2025-05-07T19:45:59.2557779Z 2025-05-07T19:45:59.2557788Z 2025-05-07T19:45:59.2557792Z 2025-05-07T19:45:59.2557801Z 2025-05-07T19:45:59.2557809Z 2025-05-07T19:45:59.2557818Z 2025-05-07T19:45:59.2557851Z 2025-05-07T19:45:59.3386822Z cuda-nvrtc-12.8.61 | 63.1 MB | ###5 | 35%  2025-05-07T19:45:59.3387186Z 2025-05-07T19:45:59.3387191Z 2025-05-07T19:45:59.3387196Z 2025-05-07T19:45:59.3387199Z 2025-05-07T19:45:59.3387206Z 2025-05-07T19:45:59.3387212Z 2025-05-07T19:45:59.3387218Z 2025-05-07T19:45:59.3387223Z 2025-05-07T19:45:59.3387229Z 2025-05-07T19:45:59.3559901Z libcurand-10.3.9.55 | 43.6 MB | #3 | 13%  2025-05-07T19:45:59.3560317Z 2025-05-07T19:45:59.3560322Z 2025-05-07T19:45:59.3560326Z 2025-05-07T19:45:59.3560329Z 2025-05-07T19:45:59.3560334Z 2025-05-07T19:45:59.3560339Z 2025-05-07T19:45:59.3560343Z 2025-05-07T19:45:59.3560349Z 2025-05-07T19:45:59.3859807Z cuda-nvrtc-12.8.61 | 63.1 MB | ####5 | 45%  2025-05-07T19:45:59.4392120Z libcublas-12.8.3.14 | 460.2 MB | #########7 | 97% 2025-05-07T19:45:59.4392491Z 2025-05-07T19:45:59.4392498Z 2025-05-07T19:45:59.4392776Z 2025-05-07T19:45:59.4392781Z 2025-05-07T19:45:59.4392786Z 2025-05-07T19:45:59.4392791Z 2025-05-07T19:45:59.4392798Z 2025-05-07T19:45:59.4392804Z 2025-05-07T19:45:59.4392876Z 2025-05-07T19:45:59.4613947Z libcurand-10.3.9.55 | 43.6 MB | ##8 | 28%  2025-05-07T19:45:59.4614327Z 2025-05-07T19:45:59.4614334Z 2025-05-07T19:45:59.4614357Z 2025-05-07T19:45:59.4614361Z 2025-05-07T19:45:59.4614364Z 2025-05-07T19:45:59.4614368Z 2025-05-07T19:45:59.4614371Z 2025-05-07T19:45:59.4614376Z 2025-05-07T19:45:59.5127169Z cuda-nvrtc-12.8.61 | 63.1 MB | #####5 | 55%  2025-05-07T19:45:59.5390693Z libcublas-12.8.3.14 | 460.2 MB | #########8 | 99% 2025-05-07T19:45:59.5391012Z 2025-05-07T19:45:59.5391017Z 2025-05-07T19:45:59.5391290Z 2025-05-07T19:45:59.5391322Z 2025-05-07T19:45:59.5391327Z 2025-05-07T19:45:59.5391332Z 2025-05-07T19:45:59.5391336Z 2025-05-07T19:45:59.5391341Z 2025-05-07T19:45:59.5391346Z 2025-05-07T19:45:59.5626745Z libcurand-10.3.9.55 | 43.6 MB | ####3 | 43%  2025-05-07T19:45:59.5627108Z 2025-05-07T19:45:59.5627115Z 2025-05-07T19:45:59.5627142Z 2025-05-07T19:45:59.5627147Z 2025-05-07T19:45:59.5627151Z 2025-05-07T19:45:59.5627156Z 2025-05-07T19:45:59.5627161Z 2025-05-07T19:45:59.5627165Z 2025-05-07T19:45:59.6391858Z cuda-nvrtc-12.8.61 | 63.1 MB | ######5 | 65%  2025-05-07T19:45:59.6392269Z 2025-05-07T19:45:59.6392273Z 2025-05-07T19:45:59.6392277Z 2025-05-07T19:45:59.6392281Z 2025-05-07T19:45:59.6392284Z 2025-05-07T19:45:59.6392301Z 2025-05-07T19:45:59.6392306Z 2025-05-07T19:45:59.6392313Z 2025-05-07T19:45:59.6392319Z 2025-05-07T19:45:59.6628765Z libcurand-10.3.9.55 | 43.6 MB | #####9 | 59%  2025-05-07T19:45:59.6629135Z 2025-05-07T19:45:59.6629141Z 2025-05-07T19:45:59.6629146Z 2025-05-07T19:45:59.6629154Z 2025-05-07T19:45:59.6629159Z 2025-05-07T19:45:59.6629164Z 2025-05-07T19:45:59.6629189Z 2025-05-07T19:45:59.6629201Z 2025-05-07T19:45:59.7395331Z cuda-nvrtc-12.8.61 | 63.1 MB | #######7 | 78%  2025-05-07T19:45:59.7395671Z 2025-05-07T19:45:59.7395676Z 2025-05-07T19:45:59.7395681Z 2025-05-07T19:45:59.7395687Z 2025-05-07T19:45:59.7395692Z 2025-05-07T19:45:59.7395698Z 2025-05-07T19:45:59.7395705Z 2025-05-07T19:45:59.7395710Z 2025-05-07T19:45:59.7628171Z 2025-05-07T19:45:59.7629208Z libcurand-10.3.9.55 | 43.6 MB | #######8 | 79%  2025-05-07T19:45:59.7629656Z 2025-05-07T19:45:59.7629666Z 2025-05-07T19:45:59.7629672Z 2025-05-07T19:45:59.7629681Z 2025-05-07T19:45:59.7629689Z 2025-05-07T19:45:59.7629697Z 2025-05-07T19:45:59.7629707Z 2025-05-07T19:45:59.7629716Z 2025-05-07T19:45:59.8396708Z cuda-nvrtc-12.8.61 | 63.1 MB | #########1 | 91%  2025-05-07T19:45:59.8397077Z 2025-05-07T19:45:59.8397082Z 2025-05-07T19:45:59.8397087Z 2025-05-07T19:45:59.8397093Z 2025-05-07T19:45:59.8397100Z 2025-05-07T19:45:59.8397135Z 2025-05-07T19:45:59.8397139Z 2025-05-07T19:45:59.8397142Z 2025-05-07T19:45:59.8397146Z 2025-05-07T19:45:59.9509877Z libcurand-10.3.9.55 | 43.6 MB | #########7 | 97%  2025-05-07T19:45:59.9510276Z 2025-05-07T19:45:59.9510283Z 2025-05-07T19:45:59.9510289Z 2025-05-07T19:46:00.1526160Z libcusolver-11.7.2.5 | 156.9 MB | ########## | 100%  2025-05-07T19:46:00.1526527Z 2025-05-07T19:46:00.1526555Z 2025-05-07T19:46:00.1526562Z 2025-05-07T19:46:00.1526565Z 2025-05-07T19:46:00.1526569Z 2025-05-07T19:46:00.1526573Z 2025-05-07T19:46:00.2331200Z cuda-nsight-12.8.55 | 113.2 MB | ########## | 100%  2025-05-07T19:46:00.2331571Z 2025-05-07T19:46:00.2331578Z 2025-05-07T19:46:00.2331585Z 2025-05-07T19:46:00.2331624Z 2025-05-07T19:46:00.2331629Z 2025-05-07T19:46:00.2331632Z 2025-05-07T19:46:00.2331637Z 2025-05-07T19:46:00.2706147Z cuda-nvvp-12.8.57 | 112.4 MB | ########## | 100%  2025-05-07T19:46:00.2706797Z 2025-05-07T19:46:00.2706802Z 2025-05-07T19:46:00.2706805Z 2025-05-07T19:46:00.2706809Z 2025-05-07T19:46:00.2706814Z 2025-05-07T19:46:00.2706817Z 2025-05-07T19:46:00.2706820Z 2025-05-07T19:46:00.2706824Z 2025-05-07T19:46:00.2706827Z 2025-05-07T19:46:00.2706831Z 2025-05-07T19:46:00.3265829Z gds-tools-1.13.0.11 | 37.9 MB | | 0%  2025-05-07T19:46:00.3266211Z 2025-05-07T19:46:00.3266219Z 2025-05-07T19:46:00.3266225Z 2025-05-07T19:46:00.3266230Z 2025-05-07T19:46:00.3266258Z 2025-05-07T19:46:00.3266268Z 2025-05-07T19:46:00.3266274Z 2025-05-07T19:46:00.3266279Z 2025-05-07T19:46:00.3266285Z 2025-05-07T19:46:00.3706777Z libcurand-10.3.9.55 | 43.6 MB | ########## | 100%  2025-05-07T19:46:00.3707128Z 2025-05-07T19:46:00.3707436Z 2025-05-07T19:46:00.3707441Z 2025-05-07T19:46:00.3707449Z 2025-05-07T19:46:00.3707477Z 2025-05-07T19:46:00.3707496Z 2025-05-07T19:46:00.3707501Z 2025-05-07T19:46:00.3707507Z 2025-05-07T19:46:00.3707534Z 2025-05-07T19:46:00.3707537Z 2025-05-07T19:46:00.3718665Z gds-tools-1.13.0.11 | 37.9 MB | ##3 | 23%  2025-05-07T19:46:00.3719030Z 2025-05-07T19:46:00.3719034Z 2025-05-07T19:46:00.3719038Z 2025-05-07T19:46:00.3719042Z 2025-05-07T19:46:00.3719057Z 2025-05-07T19:46:00.3719060Z 2025-05-07T19:46:00.3719064Z 2025-05-07T19:46:00.3719068Z 2025-05-07T19:46:00.3719072Z 2025-05-07T19:46:00.3719075Z 2025-05-07T19:46:00.3719291Z 2025-05-07T19:46:00.4707160Z libnvjitlink-12.8.61 | 28.7 MB | | 0%  2025-05-07T19:46:00.4707564Z 2025-05-07T19:46:00.4707592Z 2025-05-07T19:46:00.4707598Z 2025-05-07T19:46:00.4707604Z 2025-05-07T19:46:00.4707610Z 2025-05-07T19:46:00.4707616Z 2025-05-07T19:46:00.4707665Z 2025-05-07T19:46:00.4707669Z 2025-05-07T19:46:00.4707673Z 2025-05-07T19:46:00.4707678Z 2025-05-07T19:46:00.4725585Z gds-tools-1.13.0.11 | 37.9 MB | #### | 41%  2025-05-07T19:46:00.4725970Z 2025-05-07T19:46:00.4725975Z 2025-05-07T19:46:00.4725979Z 2025-05-07T19:46:00.4725983Z 2025-05-07T19:46:00.4725986Z 2025-05-07T19:46:00.4725990Z 2025-05-07T19:46:00.4725993Z 2025-05-07T19:46:00.4725997Z 2025-05-07T19:46:00.4726000Z 2025-05-07T19:46:00.4726004Z 2025-05-07T19:46:00.4726007Z 2025-05-07T19:46:00.5709238Z libnvjitlink-12.8.61 | 28.7 MB | ##4 | 25%  2025-05-07T19:46:00.5709614Z 2025-05-07T19:46:00.5709621Z 2025-05-07T19:46:00.5709627Z 2025-05-07T19:46:00.5709633Z 2025-05-07T19:46:00.5709638Z 2025-05-07T19:46:00.5709642Z 2025-05-07T19:46:00.5709649Z 2025-05-07T19:46:00.5709653Z 2025-05-07T19:46:00.5709658Z 2025-05-07T19:46:00.5709674Z 2025-05-07T19:46:00.5726264Z gds-tools-1.13.0.11 | 37.9 MB | #####9 | 60%  2025-05-07T19:46:00.5726603Z 2025-05-07T19:46:00.5726875Z 2025-05-07T19:46:00.5726892Z 2025-05-07T19:46:00.5726901Z 2025-05-07T19:46:00.5726909Z 2025-05-07T19:46:00.5726951Z 2025-05-07T19:46:00.5727005Z 2025-05-07T19:46:00.5727012Z 2025-05-07T19:46:00.5727018Z 2025-05-07T19:46:00.5727024Z 2025-05-07T19:46:00.5727030Z 2025-05-07T19:46:00.5947078Z libnvjitlink-12.8.61 | 28.7 MB | ####8 | 48%  2025-05-07T19:46:00.5947451Z 2025-05-07T19:46:00.5947457Z 2025-05-07T19:46:00.5947464Z 2025-05-07T19:46:00.5947483Z 2025-05-07T19:46:00.5947488Z 2025-05-07T19:46:00.5947494Z 2025-05-07T19:46:00.5947499Z 2025-05-07T19:46:00.5947504Z 2025-05-07T19:46:00.6315223Z cuda-nvrtc-12.8.61 | 63.1 MB | ########## | 100%  2025-05-07T19:46:00.6315557Z 2025-05-07T19:46:00.6315562Z 2025-05-07T19:46:00.6315567Z 2025-05-07T19:46:00.6315583Z 2025-05-07T19:46:00.6315588Z 2025-05-07T19:46:00.6315592Z 2025-05-07T19:46:00.6315618Z 2025-05-07T19:46:00.6315622Z 2025-05-07T19:46:00.6315626Z 2025-05-07T19:46:00.6315629Z 2025-05-07T19:46:00.6315633Z 2025-05-07T19:46:00.6315636Z 2025-05-07T19:46:00.6711300Z cuda-nvcc-tools-12.8 | 24.5 MB | | 0%  2025-05-07T19:46:00.6712488Z 2025-05-07T19:46:00.6712544Z 2025-05-07T19:46:00.6712550Z 2025-05-07T19:46:00.6712554Z 2025-05-07T19:46:00.6712559Z 2025-05-07T19:46:00.6712564Z 2025-05-07T19:46:00.6712568Z 2025-05-07T19:46:00.6712573Z 2025-05-07T19:46:00.6712578Z 2025-05-07T19:46:00.6712582Z 2025-05-07T19:46:00.6728564Z gds-tools-1.13.0.11 | 37.9 MB | #######7 | 77%  2025-05-07T19:46:00.6729116Z 2025-05-07T19:46:00.6729122Z 2025-05-07T19:46:00.6729126Z 2025-05-07T19:46:00.6729131Z 2025-05-07T19:46:00.6729135Z 2025-05-07T19:46:00.6729140Z 2025-05-07T19:46:00.6729145Z 2025-05-07T19:46:00.6729150Z 2025-05-07T19:46:00.6729155Z 2025-05-07T19:46:00.6729158Z 2025-05-07T19:46:00.6729370Z 2025-05-07T19:46:00.7314802Z libnvjitlink-12.8.61 | 28.7 MB | #######2 | 72%  2025-05-07T19:46:00.7315163Z 2025-05-07T19:46:00.7315169Z 2025-05-07T19:46:00.7315175Z 2025-05-07T19:46:00.7315222Z 2025-05-07T19:46:00.7315226Z 2025-05-07T19:46:00.7315229Z 2025-05-07T19:46:00.7315233Z 2025-05-07T19:46:00.7315236Z 2025-05-07T19:46:00.7315239Z 2025-05-07T19:46:00.7315243Z 2025-05-07T19:46:00.7315246Z 2025-05-07T19:46:00.7316317Z 2025-05-07T19:46:00.7730018Z cuda-nvcc-tools-12.8 | 24.5 MB | ##4 | 24%  2025-05-07T19:46:00.7730438Z 2025-05-07T19:46:00.7730446Z 2025-05-07T19:46:00.7730457Z 2025-05-07T19:46:00.7730466Z 2025-05-07T19:46:00.7730472Z 2025-05-07T19:46:00.7730505Z 2025-05-07T19:46:00.7730512Z 2025-05-07T19:46:00.7730521Z 2025-05-07T19:46:00.7730529Z 2025-05-07T19:46:00.7730539Z 2025-05-07T19:46:00.7730547Z 2025-05-07T19:46:00.7906222Z libnvjitlink-12.8.61 | 28.7 MB | #########3 | 93%  2025-05-07T19:46:00.7906576Z 2025-05-07T19:46:00.7906581Z 2025-05-07T19:46:00.7906608Z 2025-05-07T19:46:00.7906612Z 2025-05-07T19:46:00.7906618Z 2025-05-07T19:46:00.7906623Z 2025-05-07T19:46:00.7906644Z 2025-05-07T19:46:00.7906648Z 2025-05-07T19:46:00.7906651Z 2025-05-07T19:46:00.7906655Z 2025-05-07T19:46:00.8315376Z gds-tools-1.13.0.11 | 37.9 MB | #########4 | 94%  2025-05-07T19:46:00.8315761Z 2025-05-07T19:46:00.8315766Z 2025-05-07T19:46:00.8315770Z 2025-05-07T19:46:00.8315774Z 2025-05-07T19:46:00.8315777Z 2025-05-07T19:46:00.8315781Z 2025-05-07T19:46:00.8315784Z 2025-05-07T19:46:00.8315788Z 2025-05-07T19:46:00.8315791Z 2025-05-07T19:46:00.8315794Z 2025-05-07T19:46:00.8315798Z 2025-05-07T19:46:00.8315801Z 2025-05-07T19:46:00.9316268Z cuda-nvcc-tools-12.8 | 24.5 MB | ####7 | 48%  2025-05-07T19:46:00.9316734Z 2025-05-07T19:46:00.9316744Z 2025-05-07T19:46:00.9316751Z 2025-05-07T19:46:00.9316799Z 2025-05-07T19:46:00.9316805Z 2025-05-07T19:46:00.9316811Z 2025-05-07T19:46:00.9316818Z 2025-05-07T19:46:00.9316825Z 2025-05-07T19:46:00.9316832Z 2025-05-07T19:46:00.9316839Z 2025-05-07T19:46:00.9316877Z 2025-05-07T19:46:00.9317192Z 2025-05-07T19:46:01.1328158Z cuda-nvcc-tools-12.8 | 24.5 MB | ########8 | 89%  2025-05-07T19:46:01.1328584Z 2025-05-07T19:46:01.1328590Z 2025-05-07T19:46:01.2029599Z libcusparse-12.5.7.5 | 164.9 MB | ########## | 100%  2025-05-07T19:46:01.2029930Z 2025-05-07T19:46:01.2029937Z 2025-05-07T19:46:01.2029941Z 2025-05-07T19:46:01.2029945Z 2025-05-07T19:46:01.2029949Z 2025-05-07T19:46:01.2029952Z 2025-05-07T19:46:01.2029958Z 2025-05-07T19:46:01.2029963Z 2025-05-07T19:46:01.2029966Z 2025-05-07T19:46:01.2029990Z 2025-05-07T19:46:01.2029993Z 2025-05-07T19:46:01.2279834Z libnvjitlink-12.8.61 | 28.7 MB | ########## | 100%  2025-05-07T19:46:01.2280292Z 2025-05-07T19:46:01.2386205Z nsight-compute-2025. | 320.6 MB | ########## | 100%  2025-05-07T19:46:01.2386769Z 2025-05-07T19:46:01.2386778Z 2025-05-07T19:46:01.2386783Z 2025-05-07T19:46:01.2386789Z 2025-05-07T19:46:01.2386794Z 2025-05-07T19:46:01.2387028Z 2025-05-07T19:46:01.2387031Z 2025-05-07T19:46:01.2387035Z 2025-05-07T19:46:01.2387038Z 2025-05-07T19:46:01.2387042Z 2025-05-07T19:46:01.2387045Z 2025-05-07T19:46:01.2387048Z 2025-05-07T19:46:01.2387052Z 2025-05-07T19:46:01.2591214Z cuda-nvvm-tools-12.8 | 23.5 MB | | 0%  2025-05-07T19:46:01.2591608Z 2025-05-07T19:46:01.2591614Z 2025-05-07T19:46:01.2591619Z 2025-05-07T19:46:01.2591625Z 2025-05-07T19:46:01.2591630Z 2025-05-07T19:46:01.2591635Z 2025-05-07T19:46:01.2591639Z 2025-05-07T19:46:01.2591643Z 2025-05-07T19:46:01.2591648Z 2025-05-07T19:46:01.2591652Z 2025-05-07T19:46:01.2591657Z 2025-05-07T19:46:01.2591671Z 2025-05-07T19:46:01.2836769Z cuda-nvcc-tools-12.8 | 24.5 MB | ########## | 100%  2025-05-07T19:46:01.2837187Z 2025-05-07T19:46:01.2837192Z 2025-05-07T19:46:01.2837197Z 2025-05-07T19:46:01.2837202Z 2025-05-07T19:46:01.2837206Z 2025-05-07T19:46:01.2837212Z 2025-05-07T19:46:01.2837236Z 2025-05-07T19:46:01.2837241Z 2025-05-07T19:46:01.2837244Z 2025-05-07T19:46:01.2837247Z 2025-05-07T19:46:01.2935389Z gds-tools-1.13.0.11 | 37.9 MB | ########## | 100%  2025-05-07T19:46:01.2936025Z 2025-05-07T19:46:01.2936167Z 2025-05-07T19:46:01.2936176Z 2025-05-07T19:46:01.2936181Z 2025-05-07T19:46:01.2936185Z 2025-05-07T19:46:01.2936190Z 2025-05-07T19:46:01.2936195Z 2025-05-07T19:46:01.2936203Z 2025-05-07T19:46:01.2936208Z 2025-05-07T19:46:01.2936253Z 2025-05-07T19:46:01.2936258Z 2025-05-07T19:46:01.2936266Z 2025-05-07T19:46:01.2936273Z 2025-05-07T19:46:01.2936280Z 2025-05-07T19:46:01.3073085Z cuda-nvvm-impl-12.8. | 20.8 MB | | 0%  2025-05-07T19:46:01.3073620Z 2025-05-07T19:46:01.3073652Z 2025-05-07T19:46:01.3073682Z 2025-05-07T19:46:01.3073686Z 2025-05-07T19:46:01.3073691Z 2025-05-07T19:46:01.3073695Z 2025-05-07T19:46:01.3073699Z 2025-05-07T19:46:01.3073703Z 2025-05-07T19:46:01.3073721Z 2025-05-07T19:46:01.3073725Z 2025-05-07T19:46:01.3073728Z 2025-05-07T19:46:01.3073731Z 2025-05-07T19:46:01.3073735Z 2025-05-07T19:46:01.3073738Z 2025-05-07T19:46:01.3073742Z 2025-05-07T19:46:01.3244102Z cuda-nvcc-dev_linux- | 12.7 MB | | 0%  2025-05-07T19:46:01.3244550Z 2025-05-07T19:46:01.3244558Z 2025-05-07T19:46:01.3244562Z 2025-05-07T19:46:01.3244565Z 2025-05-07T19:46:01.3244569Z 2025-05-07T19:46:01.3244572Z 2025-05-07T19:46:01.3244576Z 2025-05-07T19:46:01.3244579Z 2025-05-07T19:46:01.3244584Z 2025-05-07T19:46:01.3244587Z 2025-05-07T19:46:01.3244591Z 2025-05-07T19:46:01.3244595Z 2025-05-07T19:46:01.3244598Z 2025-05-07T19:46:01.3244601Z 2025-05-07T19:46:01.3244605Z 2025-05-07T19:46:01.3244608Z 2025-05-07T19:46:01.3386582Z cuda-sanitizer-api-1 | 8.8 MB | | 0%  2025-05-07T19:46:01.3387073Z 2025-05-07T19:46:01.3387078Z 2025-05-07T19:46:01.3387096Z 2025-05-07T19:46:01.3387100Z 2025-05-07T19:46:01.3387103Z 2025-05-07T19:46:01.3387107Z 2025-05-07T19:46:01.3387110Z 2025-05-07T19:46:01.3387114Z 2025-05-07T19:46:01.3387117Z 2025-05-07T19:46:01.3387147Z 2025-05-07T19:46:01.3387150Z 2025-05-07T19:46:01.3387154Z 2025-05-07T19:46:01.3387157Z 2025-05-07T19:46:01.3941681Z cuda-nvvm-tools-12.8 | 23.5 MB | ##1 | 22%  2025-05-07T19:46:01.3942077Z 2025-05-07T19:46:01.3942083Z 2025-05-07T19:46:01.3942088Z 2025-05-07T19:46:01.3942120Z 2025-05-07T19:46:01.3942124Z 2025-05-07T19:46:01.3942129Z 2025-05-07T19:46:01.3942133Z 2025-05-07T19:46:01.3942138Z 2025-05-07T19:46:01.3942142Z 2025-05-07T19:46:01.3942147Z 2025-05-07T19:46:01.3942150Z 2025-05-07T19:46:01.3942154Z 2025-05-07T19:46:01.3942191Z 2025-05-07T19:46:01.3942195Z 2025-05-07T19:46:01.4073127Z cuda-nvvm-impl-12.8. | 20.8 MB | ## | 20%  2025-05-07T19:46:01.4073558Z 2025-05-07T19:46:01.4073826Z 2025-05-07T19:46:01.4073830Z 2025-05-07T19:46:01.4073835Z 2025-05-07T19:46:01.4073839Z 2025-05-07T19:46:01.4073842Z 2025-05-07T19:46:01.4073846Z 2025-05-07T19:46:01.4073850Z 2025-05-07T19:46:01.4073853Z 2025-05-07T19:46:01.4073857Z 2025-05-07T19:46:01.4073860Z 2025-05-07T19:46:01.4073863Z 2025-05-07T19:46:01.4073867Z 2025-05-07T19:46:01.4073870Z 2025-05-07T19:46:01.4073883Z 2025-05-07T19:46:01.4246247Z cuda-nvcc-dev_linux- | 12.7 MB | ###3 | 34%  2025-05-07T19:46:01.4246762Z 2025-05-07T19:46:01.4246823Z 2025-05-07T19:46:01.4246828Z 2025-05-07T19:46:01.4246833Z 2025-05-07T19:46:01.4246837Z 2025-05-07T19:46:01.4246842Z 2025-05-07T19:46:01.4246847Z 2025-05-07T19:46:01.4246851Z 2025-05-07T19:46:01.4246856Z 2025-05-07T19:46:01.4247082Z 2025-05-07T19:46:01.4247088Z 2025-05-07T19:46:01.4247098Z 2025-05-07T19:46:01.4247101Z 2025-05-07T19:46:01.4247106Z 2025-05-07T19:46:01.4247109Z 2025-05-07T19:46:01.4247113Z 2025-05-07T19:46:01.4387912Z cuda-sanitizer-api-1 | 8.8 MB | ####5 | 46%  2025-05-07T19:46:01.4388327Z 2025-05-07T19:46:01.4388335Z 2025-05-07T19:46:01.4388341Z 2025-05-07T19:46:01.4388362Z 2025-05-07T19:46:01.4388367Z 2025-05-07T19:46:01.4388372Z 2025-05-07T19:46:01.4388378Z 2025-05-07T19:46:01.4388383Z 2025-05-07T19:46:01.4388389Z 2025-05-07T19:46:01.4388394Z 2025-05-07T19:46:01.4388400Z 2025-05-07T19:46:01.4388406Z 2025-05-07T19:46:01.4388410Z 2025-05-07T19:46:01.4945153Z cuda-nvvm-tools-12.8 | 23.5 MB | #### | 41%  2025-05-07T19:46:01.4945530Z 2025-05-07T19:46:01.4945536Z 2025-05-07T19:46:01.4945542Z 2025-05-07T19:46:01.4945547Z 2025-05-07T19:46:01.4945552Z 2025-05-07T19:46:01.4945562Z 2025-05-07T19:46:01.4945597Z 2025-05-07T19:46:01.4945601Z 2025-05-07T19:46:01.4945606Z 2025-05-07T19:46:01.4945610Z 2025-05-07T19:46:01.4945614Z 2025-05-07T19:46:01.4945618Z 2025-05-07T19:46:01.4945622Z 2025-05-07T19:46:01.4945644Z 2025-05-07T19:46:01.5074215Z cuda-nvvm-impl-12.8. | 20.8 MB | ####1 | 41%  2025-05-07T19:46:01.5074585Z 2025-05-07T19:46:01.5074589Z 2025-05-07T19:46:01.5074593Z 2025-05-07T19:46:01.5074597Z 2025-05-07T19:46:01.5074600Z 2025-05-07T19:46:01.5074604Z 2025-05-07T19:46:01.5074607Z 2025-05-07T19:46:01.5074611Z 2025-05-07T19:46:01.5074614Z 2025-05-07T19:46:01.5074618Z 2025-05-07T19:46:01.5074621Z 2025-05-07T19:46:01.5074625Z 2025-05-07T19:46:01.5074628Z 2025-05-07T19:46:01.5074632Z 2025-05-07T19:46:01.5074825Z 2025-05-07T19:46:01.5252459Z cuda-nvcc-dev_linux- | 12.7 MB | ######9 | 69%  2025-05-07T19:46:01.5252903Z 2025-05-07T19:46:01.5253004Z 2025-05-07T19:46:01.5253024Z 2025-05-07T19:46:01.5253062Z 2025-05-07T19:46:01.5253080Z 2025-05-07T19:46:01.5253265Z 2025-05-07T19:46:01.5253269Z 2025-05-07T19:46:01.5253272Z 2025-05-07T19:46:01.5253276Z 2025-05-07T19:46:01.5253281Z 2025-05-07T19:46:01.5253294Z 2025-05-07T19:46:01.5253337Z 2025-05-07T19:46:01.5253341Z 2025-05-07T19:46:01.5253429Z 2025-05-07T19:46:01.5253432Z 2025-05-07T19:46:01.5253560Z 2025-05-07T19:46:01.5389892Z cuda-sanitizer-api-1 | 8.8 MB | #########4 | 95%  2025-05-07T19:46:01.5390311Z 2025-05-07T19:46:01.5390316Z 2025-05-07T19:46:01.5390322Z 2025-05-07T19:46:01.5390327Z 2025-05-07T19:46:01.5390332Z 2025-05-07T19:46:01.5390336Z 2025-05-07T19:46:01.5390341Z 2025-05-07T19:46:01.5390346Z 2025-05-07T19:46:01.5390350Z 2025-05-07T19:46:01.5390356Z 2025-05-07T19:46:01.5390361Z 2025-05-07T19:46:01.5390365Z 2025-05-07T19:46:01.5390379Z 2025-05-07T19:46:01.5946390Z cuda-nvvm-tools-12.8 | 23.5 MB | #####7 | 58%  2025-05-07T19:46:01.5946797Z 2025-05-07T19:46:01.5946803Z 2025-05-07T19:46:01.5946808Z 2025-05-07T19:46:01.5946811Z 2025-05-07T19:46:01.5946816Z 2025-05-07T19:46:01.5946819Z 2025-05-07T19:46:01.5947072Z 2025-05-07T19:46:01.5947075Z 2025-05-07T19:46:01.5947079Z 2025-05-07T19:46:01.5947082Z 2025-05-07T19:46:01.5947086Z 2025-05-07T19:46:01.5947089Z 2025-05-07T19:46:01.5947093Z 2025-05-07T19:46:01.5947096Z 2025-05-07T19:46:01.6390671Z cuda-nvvm-impl-12.8. | 20.8 MB | ######5 | 65%  2025-05-07T19:46:01.6391037Z 2025-05-07T19:46:01.6391043Z 2025-05-07T19:46:01.6391048Z 2025-05-07T19:46:01.6391052Z 2025-05-07T19:46:01.6391056Z 2025-05-07T19:46:01.6391060Z 2025-05-07T19:46:01.6391063Z 2025-05-07T19:46:01.6391068Z 2025-05-07T19:46:01.6391071Z 2025-05-07T19:46:01.6391104Z 2025-05-07T19:46:01.6391107Z 2025-05-07T19:46:01.6391112Z 2025-05-07T19:46:01.6945853Z 2025-05-07T19:46:01.6946909Z cuda-nvvm-tools-12.8 | 23.5 MB | ########2 | 83%  2025-05-07T19:46:01.6947293Z 2025-05-07T19:46:01.6947300Z 2025-05-07T19:46:01.6947305Z 2025-05-07T19:46:01.6947310Z 2025-05-07T19:46:01.6947313Z 2025-05-07T19:46:01.6947356Z 2025-05-07T19:46:01.6947360Z 2025-05-07T19:46:01.6947363Z 2025-05-07T19:46:01.6947367Z 2025-05-07T19:46:01.6947370Z 2025-05-07T19:46:01.6947374Z 2025-05-07T19:46:01.6947377Z 2025-05-07T19:46:01.6947381Z 2025-05-07T19:46:01.6947384Z 2025-05-07T19:46:01.7105534Z cuda-nvvm-impl-12.8. | 20.8 MB | #########5 | 95%  2025-05-07T19:46:01.7105914Z 2025-05-07T19:46:01.7105919Z 2025-05-07T19:46:01.7105923Z 2025-05-07T19:46:01.7105926Z 2025-05-07T19:46:01.7105930Z 2025-05-07T19:46:01.7105933Z 2025-05-07T19:46:01.7105937Z 2025-05-07T19:46:01.7105941Z 2025-05-07T19:46:01.7105945Z 2025-05-07T19:46:01.7105948Z 2025-05-07T19:46:01.7105952Z 2025-05-07T19:46:01.7105955Z 2025-05-07T19:46:01.7105959Z 2025-05-07T19:46:01.7105963Z 2025-05-07T19:46:01.7105983Z 2025-05-07T19:46:01.7105995Z 2025-05-07T19:46:01.7461233Z cuda-sanitizer-api-1 | 8.8 MB | ########## | 100%  2025-05-07T19:46:01.7461625Z 2025-05-07T19:46:01.7461743Z 2025-05-07T19:46:01.7461748Z 2025-05-07T19:46:01.7461754Z 2025-05-07T19:46:01.7461758Z 2025-05-07T19:46:01.7461889Z 2025-05-07T19:46:01.7461907Z 2025-05-07T19:46:01.7461917Z 2025-05-07T19:46:01.7461926Z 2025-05-07T19:46:01.7461987Z 2025-05-07T19:46:01.7461995Z 2025-05-07T19:46:01.7462003Z 2025-05-07T19:46:01.7462014Z 2025-05-07T19:46:01.7462020Z 2025-05-07T19:46:01.7462027Z 2025-05-07T19:46:01.7462037Z 2025-05-07T19:46:01.7462046Z 2025-05-07T19:46:01.7754536Z cuda-nvdisasm-12.8.5 | 4.9 MB | | 0%  2025-05-07T19:46:01.7754928Z 2025-05-07T19:46:01.7754935Z 2025-05-07T19:46:01.7754941Z 2025-05-07T19:46:01.7754947Z 2025-05-07T19:46:01.7754953Z 2025-05-07T19:46:01.7754958Z 2025-05-07T19:46:01.7754992Z 2025-05-07T19:46:01.7754996Z 2025-05-07T19:46:01.7755000Z 2025-05-07T19:46:01.7755024Z 2025-05-07T19:46:01.7755028Z 2025-05-07T19:46:01.7755033Z 2025-05-07T19:46:01.7755037Z 2025-05-07T19:46:01.7755042Z 2025-05-07T19:46:01.7755067Z 2025-05-07T19:46:01.7755420Z cuda-nvcc-dev_linux- | 12.7 MB | ########## | 100%  2025-05-07T19:46:01.7755754Z 2025-05-07T19:46:01.7755757Z 2025-05-07T19:46:01.7755761Z 2025-05-07T19:46:01.7755765Z 2025-05-07T19:46:01.7755768Z 2025-05-07T19:46:01.7755771Z 2025-05-07T19:46:01.7755775Z 2025-05-07T19:46:01.7755778Z 2025-05-07T19:46:01.7755782Z 2025-05-07T19:46:01.7755785Z 2025-05-07T19:46:01.7755788Z 2025-05-07T19:46:01.7755792Z 2025-05-07T19:46:01.7755795Z 2025-05-07T19:46:01.7755799Z 2025-05-07T19:46:01.7755802Z 2025-05-07T19:46:01.8160809Z cuda-nvcc-dev_linux- | 12.7 MB | ########## | 100%  2025-05-07T19:46:01.8161229Z 2025-05-07T19:46:01.8161235Z 2025-05-07T19:46:01.8161271Z 2025-05-07T19:46:01.8161277Z 2025-05-07T19:46:01.8161282Z 2025-05-07T19:46:01.8161286Z 2025-05-07T19:46:01.8161291Z 2025-05-07T19:46:01.8161309Z 2025-05-07T19:46:01.8161313Z 2025-05-07T19:46:01.8161579Z 2025-05-07T19:46:01.8161582Z 2025-05-07T19:46:01.8161586Z 2025-05-07T19:46:01.8161589Z 2025-05-07T19:46:01.8161593Z 2025-05-07T19:46:01.8161596Z 2025-05-07T19:46:01.8161599Z 2025-05-07T19:46:01.8161603Z 2025-05-07T19:46:01.8161606Z 2025-05-07T19:46:01.8562408Z cuda-cupti-dev-12.8. | 4.0 MB | | 0%  2025-05-07T19:46:01.8562847Z 2025-05-07T19:46:01.8562854Z 2025-05-07T19:46:01.8562861Z 2025-05-07T19:46:01.8562867Z 2025-05-07T19:46:01.8562872Z 2025-05-07T19:46:01.8562877Z 2025-05-07T19:46:01.8562883Z 2025-05-07T19:46:01.8562889Z 2025-05-07T19:46:01.8562894Z 2025-05-07T19:46:01.8562899Z 2025-05-07T19:46:01.8562903Z 2025-05-07T19:46:01.8562907Z 2025-05-07T19:46:01.8562912Z 2025-05-07T19:46:01.8563173Z 2025-05-07T19:46:01.8563179Z 2025-05-07T19:46:01.8563184Z 2025-05-07T19:46:01.8563188Z 2025-05-07T19:46:01.8563553Z cuda-nvdisasm-12.8.5 | 4.9 MB | ########## | 100%  2025-05-07T19:46:01.8563920Z 2025-05-07T19:46:01.8563923Z 2025-05-07T19:46:01.8563927Z 2025-05-07T19:46:01.8563930Z 2025-05-07T19:46:01.8563933Z 2025-05-07T19:46:01.8563937Z 2025-05-07T19:46:01.8563940Z 2025-05-07T19:46:01.8563944Z 2025-05-07T19:46:01.8563947Z 2025-05-07T19:46:01.8563950Z 2025-05-07T19:46:01.8563968Z 2025-05-07T19:46:01.8563971Z 2025-05-07T19:46:01.8563975Z 2025-05-07T19:46:01.8563978Z 2025-05-07T19:46:01.8563982Z 2025-05-07T19:46:01.8563985Z 2025-05-07T19:46:01.8563989Z 2025-05-07T19:46:01.9082216Z cuda-nvdisasm-12.8.5 | 4.9 MB | ########## | 100%  2025-05-07T19:46:01.9082723Z 2025-05-07T19:46:01.9082734Z 2025-05-07T19:46:01.9082744Z 2025-05-07T19:46:01.9082751Z 2025-05-07T19:46:01.9082758Z 2025-05-07T19:46:01.9082802Z 2025-05-07T19:46:01.9082808Z 2025-05-07T19:46:01.9082816Z 2025-05-07T19:46:01.9082822Z 2025-05-07T19:46:01.9082829Z 2025-05-07T19:46:01.9082836Z 2025-05-07T19:46:01.9082842Z 2025-05-07T19:46:01.9082920Z 2025-05-07T19:46:01.9082925Z 2025-05-07T19:46:01.9082930Z 2025-05-07T19:46:01.9082935Z 2025-05-07T19:46:01.9082941Z 2025-05-07T19:46:01.9082946Z 2025-05-07T19:46:01.9106295Z cuda-cupti-dev-12.8. | 4.0 MB | ########## | 100%  2025-05-07T19:46:01.9106687Z 2025-05-07T19:46:01.9106695Z 2025-05-07T19:46:01.9106723Z 2025-05-07T19:46:01.9106730Z 2025-05-07T19:46:01.9106736Z 2025-05-07T19:46:01.9106741Z 2025-05-07T19:46:01.9106771Z 2025-05-07T19:46:01.9106776Z 2025-05-07T19:46:01.9106781Z 2025-05-07T19:46:01.9106788Z 2025-05-07T19:46:01.9106793Z 2025-05-07T19:46:01.9106801Z 2025-05-07T19:46:01.9106806Z 2025-05-07T19:46:01.9106811Z 2025-05-07T19:46:01.9106817Z 2025-05-07T19:46:01.9106823Z 2025-05-07T19:46:01.9106854Z 2025-05-07T19:46:01.9106860Z 2025-05-07T19:46:01.9106863Z 2025-05-07T19:46:01.9629315Z ... (more hidden) ... 2025-05-07T19:46:01.9629666Z 2025-05-07T19:46:01.9629713Z 2025-05-07T19:46:01.9629717Z 2025-05-07T19:46:01.9629720Z 2025-05-07T19:46:01.9629724Z 2025-05-07T19:46:01.9629728Z 2025-05-07T19:46:01.9629732Z 2025-05-07T19:46:01.9629735Z 2025-05-07T19:46:01.9629739Z 2025-05-07T19:46:01.9629742Z 2025-05-07T19:46:01.9629746Z 2025-05-07T19:46:01.9629750Z 2025-05-07T19:46:01.9629753Z 2025-05-07T19:46:01.9629757Z 2025-05-07T19:46:01.9977409Z cuda-nvvm-impl-12.8. | 20.8 MB | ########## | 100%  2025-05-07T19:46:01.9977853Z 2025-05-07T19:46:01.9978127Z 2025-05-07T19:46:01.9978141Z 2025-05-07T19:46:01.9978148Z 2025-05-07T19:46:01.9978153Z 2025-05-07T19:46:01.9978204Z 2025-05-07T19:46:01.9978209Z 2025-05-07T19:46:01.9978215Z 2025-05-07T19:46:01.9978221Z 2025-05-07T19:46:01.9978264Z 2025-05-07T19:46:01.9978269Z 2025-05-07T19:46:01.9978274Z 2025-05-07T19:46:01.9978279Z 2025-05-07T19:46:01.9978284Z 2025-05-07T19:46:01.9978289Z 2025-05-07T19:46:01.9978294Z 2025-05-07T19:46:01.9978299Z 2025-05-07T19:46:01.9978597Z 2025-05-07T19:46:01.9978601Z 2025-05-07T19:46:02.0235011Z ... (more hidden) ... 2025-05-07T19:46:02.0235401Z 2025-05-07T19:46:02.0235408Z 2025-05-07T19:46:02.0235412Z 2025-05-07T19:46:02.0235416Z 2025-05-07T19:46:02.0235420Z 2025-05-07T19:46:02.0235425Z 2025-05-07T19:46:02.0235430Z 2025-05-07T19:46:02.0235435Z 2025-05-07T19:46:02.0235438Z 2025-05-07T19:46:02.0235443Z 2025-05-07T19:46:02.0235446Z 2025-05-07T19:46:02.0235450Z 2025-05-07T19:46:02.0235455Z 2025-05-07T19:46:02.3174436Z cuda-nvvm-tools-12.8 | 23.5 MB | ########## | 100%  2025-05-07T19:46:02.3174849Z 2025-05-07T19:46:02.3174856Z 2025-05-07T19:46:02.3174878Z 2025-05-07T19:46:02.3174883Z 2025-05-07T19:46:02.3175211Z 2025-05-07T19:46:02.3175218Z 2025-05-07T19:46:02.3175222Z 2025-05-07T19:46:02.3175227Z 2025-05-07T19:46:02.3175235Z 2025-05-07T19:46:02.4989315Z libcurand-10.3.9.55 | 43.6 MB | ########## | 100%  2025-05-07T19:46:02.4989749Z 2025-05-07T19:46:02.4989755Z 2025-05-07T19:46:02.4989776Z 2025-05-07T19:46:02.4989780Z 2025-05-07T19:46:02.4989784Z 2025-05-07T19:46:02.4989787Z 2025-05-07T19:46:02.4989804Z 2025-05-07T19:46:02.7935085Z cuda-nvvp-12.8.57 | 112.4 MB | ########## | 100%  2025-05-07T19:46:02.7935444Z 2025-05-07T19:46:02.7935449Z 2025-05-07T19:46:02.7935452Z 2025-05-07T19:46:02.7935456Z 2025-05-07T19:46:02.7935459Z 2025-05-07T19:46:02.9864459Z libnpp-12.3.3.65 | 130.6 MB | ########## | 100%  2025-05-07T19:46:02.9864794Z 2025-05-07T19:46:02.9864800Z 2025-05-07T19:46:02.9864804Z 2025-05-07T19:46:02.9864809Z 2025-05-07T19:46:02.9864814Z 2025-05-07T19:46:02.9864819Z 2025-05-07T19:46:02.9864824Z 2025-05-07T19:46:02.9864867Z 2025-05-07T19:46:02.9864872Z 2025-05-07T19:46:02.9864875Z 2025-05-07T19:46:02.9864880Z 2025-05-07T19:46:03.3737507Z libnvjitlink-12.8.61 | 28.7 MB | ########## | 100%  2025-05-07T19:46:03.3737930Z 2025-05-07T19:46:03.3737935Z 2025-05-07T19:46:03.3737939Z 2025-05-07T19:46:03.3737943Z 2025-05-07T19:46:03.3737946Z 2025-05-07T19:46:03.3737951Z 2025-05-07T19:46:03.3737955Z 2025-05-07T19:46:03.3737960Z 2025-05-07T19:46:03.3737964Z 2025-05-07T19:46:03.3737968Z 2025-05-07T19:46:03.3737971Z 2025-05-07T19:46:03.3737975Z 2025-05-07T19:46:03.4143095Z cuda-nvcc-tools-12.8 | 24.5 MB | ########## | 100%  2025-05-07T19:46:03.4143550Z 2025-05-07T19:46:03.4143557Z 2025-05-07T19:46:03.4143564Z 2025-05-07T19:46:03.4143570Z 2025-05-07T19:46:03.4143575Z 2025-05-07T19:46:03.4143581Z 2025-05-07T19:46:03.4143586Z 2025-05-07T19:46:03.4143591Z 2025-05-07T19:46:03.6112151Z cuda-nvrtc-12.8.61 | 63.1 MB | ########## | 100%  2025-05-07T19:46:03.6112531Z 2025-05-07T19:46:03.6112537Z 2025-05-07T19:46:03.6112542Z 2025-05-07T19:46:03.6112547Z 2025-05-07T19:46:03.6112552Z 2025-05-07T19:46:03.6112556Z 2025-05-07T19:46:03.6112584Z 2025-05-07T19:46:03.6112601Z 2025-05-07T19:46:03.6112605Z 2025-05-07T19:46:03.6112608Z 2025-05-07T19:46:03.6112611Z 2025-05-07T19:46:03.6112615Z 2025-05-07T19:46:03.6112618Z 2025-05-07T19:46:03.6112621Z 2025-05-07T19:46:03.6112625Z 2025-05-07T19:46:03.6112628Z 2025-05-07T19:46:03.6235197Z cuda-sanitizer-api-1 | 8.8 MB | ########## | 100%  2025-05-07T19:46:03.6235637Z 2025-05-07T19:46:03.6235642Z 2025-05-07T19:46:03.6235646Z 2025-05-07T19:46:03.6235649Z 2025-05-07T19:46:03.6235653Z 2025-05-07T19:46:03.6235656Z 2025-05-07T19:46:03.6235660Z 2025-05-07T19:46:03.6235664Z 2025-05-07T19:46:03.6235668Z 2025-05-07T19:46:03.6235671Z 2025-05-07T19:46:03.6531496Z gds-tools-1.13.0.11 | 37.9 MB | ########## | 100%  2025-05-07T19:46:03.6531889Z 2025-05-07T19:46:03.6531894Z 2025-05-07T19:46:03.6531899Z 2025-05-07T19:46:03.6531904Z 2025-05-07T19:46:03.6531909Z 2025-05-07T19:46:03.6531913Z 2025-05-07T19:46:03.6532171Z 2025-05-07T19:46:03.6532175Z 2025-05-07T19:46:03.6532179Z 2025-05-07T19:46:03.6532182Z 2025-05-07T19:46:03.6532185Z 2025-05-07T19:46:03.6532189Z 2025-05-07T19:46:03.6532193Z 2025-05-07T19:46:03.6532198Z 2025-05-07T19:46:03.6532203Z 2025-05-07T19:46:03.6532206Z 2025-05-07T19:46:03.6532236Z 2025-05-07T19:46:03.7455013Z cuda-nvdisasm-12.8.5 | 4.9 MB | ########## | 100%  2025-05-07T19:46:03.7795681Z libcublas-12.8.3.14 | 460.2 MB | ########## | 100% 2025-05-07T19:46:03.7796209Z 2025-05-07T19:46:03.7796221Z 2025-05-07T19:46:03.7796235Z 2025-05-07T19:46:03.7796247Z 2025-05-07T19:46:03.7796256Z 2025-05-07T19:46:03.7796265Z 2025-05-07T19:46:03.7796277Z 2025-05-07T19:46:03.7796287Z 2025-05-07T19:46:03.7796742Z 2025-05-07T19:46:03.7796753Z 2025-05-07T19:46:03.7796762Z 2025-05-07T19:46:03.7796771Z 2025-05-07T19:46:03.7796780Z 2025-05-07T19:46:03.7796790Z 2025-05-07T19:46:03.7796797Z 2025-05-07T19:46:03.7796803Z 2025-05-07T19:46:03.7796839Z 2025-05-07T19:46:03.7796844Z 2025-05-07T19:46:03.7797488Z cuda-cupti-dev-12.8. | 4.0 MB | ########## | 100%  2025-05-07T19:46:03.7797882Z 2025-05-07T19:46:03.7797886Z 2025-05-07T19:46:03.7797889Z 2025-05-07T19:46:03.7797893Z 2025-05-07T19:46:03.7797896Z 2025-05-07T19:46:03.7797899Z 2025-05-07T19:46:03.7797903Z 2025-05-07T19:46:03.7797906Z 2025-05-07T19:46:03.7797910Z 2025-05-07T19:46:03.7797913Z 2025-05-07T19:46:03.7797917Z 2025-05-07T19:46:03.7797920Z 2025-05-07T19:46:03.7797923Z 2025-05-07T19:46:03.7797927Z 2025-05-07T19:46:03.7797930Z 2025-05-07T19:46:03.7797934Z 2025-05-07T19:46:03.7797937Z 2025-05-07T19:46:03.7797941Z 2025-05-07T19:46:03.8548462Z cuda-cupti-dev-12.8. | 4.0 MB | ########## | 100%  2025-05-07T19:46:03.8548871Z 2025-05-07T19:46:03.8548877Z 2025-05-07T19:46:03.8548882Z 2025-05-07T19:46:03.8548886Z 2025-05-07T19:46:03.8548891Z 2025-05-07T19:46:03.8548917Z 2025-05-07T19:46:03.8548921Z 2025-05-07T19:46:03.8548924Z 2025-05-07T19:46:03.8548928Z 2025-05-07T19:46:03.8548931Z 2025-05-07T19:46:03.8548934Z 2025-05-07T19:46:03.8548970Z 2025-05-07T19:46:03.8548973Z 2025-05-07T19:46:03.8548976Z 2025-05-07T19:46:03.8548980Z 2025-05-07T19:46:03.8862738Z cuda-nvcc-dev_linux- | 12.7 MB | ########## | 100%  2025-05-07T19:46:03.8863132Z 2025-05-07T19:46:03.8863139Z 2025-05-07T19:46:03.8863146Z 2025-05-07T19:46:03.8863152Z 2025-05-07T19:46:03.8863159Z 2025-05-07T19:46:03.8863165Z 2025-05-07T19:46:03.8863172Z 2025-05-07T19:46:03.8863178Z 2025-05-07T19:46:03.8863183Z 2025-05-07T19:46:03.8863188Z 2025-05-07T19:46:03.8863195Z 2025-05-07T19:46:03.8863202Z 2025-05-07T19:46:03.8863240Z 2025-05-07T19:46:03.8863245Z 2025-05-07T19:46:03.8863248Z 2025-05-07T19:46:03.8863253Z 2025-05-07T19:46:03.8863282Z 2025-05-07T19:46:03.8863287Z 2025-05-07T19:46:03.8863292Z 2025-05-07T19:46:03.8863722Z ... (more hidden) ... 2025-05-07T19:46:03.8864034Z 2025-05-07T19:46:03.8864038Z 2025-05-07T19:46:03.8864041Z 2025-05-07T19:46:03.8864044Z 2025-05-07T19:46:03.8864048Z 2025-05-07T19:46:03.8864051Z 2025-05-07T19:46:03.8864055Z 2025-05-07T19:46:03.8864058Z 2025-05-07T19:46:03.8864061Z 2025-05-07T19:46:03.8864065Z 2025-05-07T19:46:03.8864068Z 2025-05-07T19:46:03.8864100Z 2025-05-07T19:46:03.8864103Z 2025-05-07T19:46:03.8864107Z 2025-05-07T19:46:03.8864110Z 2025-05-07T19:46:03.8864113Z 2025-05-07T19:46:03.8864117Z 2025-05-07T19:46:03.8864120Z 2025-05-07T19:46:03.8864124Z 2025-05-07T19:46:04.0200276Z ... (more hidden) ... 2025-05-07T19:46:04.0201253Z 2025-05-07T19:46:04.0201358Z 2025-05-07T19:46:04.0201370Z 2025-05-07T19:46:04.0201383Z 2025-05-07T19:46:04.0201396Z 2025-05-07T19:46:04.0201407Z 2025-05-07T19:46:04.0201418Z 2025-05-07T19:46:04.0201429Z 2025-05-07T19:46:04.0201937Z 2025-05-07T19:46:04.0201948Z 2025-05-07T19:46:04.0201959Z 2025-05-07T19:46:04.0201969Z 2025-05-07T19:46:04.0201979Z 2025-05-07T19:46:04.0201989Z 2025-05-07T19:46:04.2705365Z cuda-nvvm-impl-12.8. | 20.8 MB | ########## | 100%  2025-05-07T19:46:04.2706465Z 2025-05-07T19:46:04.2706479Z 2025-05-07T19:46:04.2706490Z 2025-05-07T19:46:04.2706500Z 2025-05-07T19:46:04.2706511Z 2025-05-07T19:46:04.2706521Z 2025-05-07T19:46:04.2706531Z 2025-05-07T19:46:04.2706542Z 2025-05-07T19:46:04.2706552Z 2025-05-07T19:46:04.2706562Z 2025-05-07T19:46:04.2706573Z 2025-05-07T19:46:04.2706583Z 2025-05-07T19:46:04.2706593Z 2025-05-07T19:46:07.2851275Z cuda-nvvm-tools-12.8 | 23.5 MB | ########## | 100%  2025-05-07T19:46:07.2852783Z 2025-05-07T19:46:07.4867420Z nsight-compute-2025. | 320.6 MB | ########## | 100%  2025-05-07T19:46:07.4876621Z libcublas-12.8.3.14 | 460.2 MB | ########## | 100% 2025-05-07T19:46:07.4877471Z 2025-05-07T19:46:07.4877520Z 2025-05-07T19:46:07.4877533Z 2025-05-07T19:46:07.4877720Z 2025-05-07T19:46:07.4877724Z 2025-05-07T19:46:07.4877727Z 2025-05-07T19:46:07.4877731Z 2025-05-07T19:46:07.4877735Z 2025-05-07T19:46:07.4877738Z 2025-05-07T19:46:07.4877742Z 2025-05-07T19:46:07.4877745Z 2025-05-07T19:46:07.4877749Z 2025-05-07T19:46:07.4877752Z 2025-05-07T19:46:07.4877756Z 2025-05-07T19:46:07.4877760Z 2025-05-07T19:46:07.4877792Z 2025-05-07T19:46:07.4877796Z 2025-05-07T19:46:07.4877799Z 2025-05-07T19:46:07.4877802Z 2025-05-07T19:46:07.4877910Z 2025-05-07T19:46:07.4878328Z  2025-05-07T19:46:07.4878708Z 2025-05-07T19:46:07.4878921Z 2025-05-07T19:46:07.4879128Z  2025-05-07T19:46:07.4879345Z 2025-05-07T19:46:07.4879349Z 2025-05-07T19:46:07.4879518Z  2025-05-07T19:46:07.4879755Z 2025-05-07T19:46:07.4879758Z 2025-05-07T19:46:07.4879762Z 2025-05-07T19:46:07.4879931Z  2025-05-07T19:46:07.4880152Z 2025-05-07T19:46:07.4880156Z 2025-05-07T19:46:07.4880160Z 2025-05-07T19:46:07.4880164Z 2025-05-07T19:46:07.4880356Z  2025-05-07T19:46:07.4880609Z 2025-05-07T19:46:07.4880613Z 2025-05-07T19:46:07.4880617Z 2025-05-07T19:46:07.4880622Z 2025-05-07T19:46:07.4880626Z 2025-05-07T19:46:07.4880829Z  2025-05-07T19:46:07.4881048Z 2025-05-07T19:46:07.4881052Z 2025-05-07T19:46:07.4881055Z 2025-05-07T19:46:07.4881059Z 2025-05-07T19:46:07.4881062Z 2025-05-07T19:46:07.4881070Z 2025-05-07T19:46:07.4881286Z  2025-05-07T19:46:07.4881506Z 2025-05-07T19:46:07.4881510Z 2025-05-07T19:46:07.4881513Z 2025-05-07T19:46:07.4881522Z 2025-05-07T19:46:07.4881526Z 2025-05-07T19:46:07.4881529Z 2025-05-07T19:46:07.4881533Z 2025-05-07T19:46:07.4881721Z  2025-05-07T19:46:07.4881965Z 2025-05-07T19:46:07.4881969Z 2025-05-07T19:46:07.4881972Z 2025-05-07T19:46:07.4881975Z 2025-05-07T19:46:07.4881978Z 2025-05-07T19:46:07.4881982Z 2025-05-07T19:46:07.4881985Z 2025-05-07T19:46:07.4881988Z 2025-05-07T19:46:07.4882174Z  2025-05-07T19:46:07.4882416Z 2025-05-07T19:46:07.4882419Z 2025-05-07T19:46:07.4882423Z 2025-05-07T19:46:07.4882426Z 2025-05-07T19:46:07.4882430Z 2025-05-07T19:46:07.4882433Z 2025-05-07T19:46:07.4882436Z 2025-05-07T19:46:07.4882440Z 2025-05-07T19:46:07.4882446Z 2025-05-07T19:46:07.4882632Z  2025-05-07T19:46:07.4882898Z 2025-05-07T19:46:07.4882901Z 2025-05-07T19:46:07.4883149Z 2025-05-07T19:46:07.4883152Z 2025-05-07T19:46:07.4883156Z 2025-05-07T19:46:07.4883159Z 2025-05-07T19:46:07.4883164Z 2025-05-07T19:46:07.4883167Z 2025-05-07T19:46:07.4883170Z 2025-05-07T19:46:07.4883174Z 2025-05-07T19:46:07.4883382Z  2025-05-07T19:46:07.4883630Z 2025-05-07T19:46:07.4883634Z 2025-05-07T19:46:07.4883637Z 2025-05-07T19:46:07.4883641Z 2025-05-07T19:46:07.4883644Z 2025-05-07T19:46:07.4883648Z 2025-05-07T19:46:07.4883651Z 2025-05-07T19:46:07.4883654Z 2025-05-07T19:46:07.4883658Z 2025-05-07T19:46:07.4883661Z 2025-05-07T19:46:07.4883665Z 2025-05-07T19:46:07.4883862Z  2025-05-07T19:46:07.4884215Z 2025-05-07T19:46:07.4884219Z 2025-05-07T19:46:07.4884223Z 2025-05-07T19:46:07.4884226Z 2025-05-07T19:46:07.4884229Z 2025-05-07T19:46:07.4884232Z 2025-05-07T19:46:07.4884236Z 2025-05-07T19:46:07.4884240Z 2025-05-07T19:46:07.4884246Z 2025-05-07T19:46:07.4884250Z 2025-05-07T19:46:07.4884253Z 2025-05-07T19:46:07.4884256Z 2025-05-07T19:46:07.4884524Z  2025-05-07T19:46:07.4884934Z 2025-05-07T19:46:07.4884937Z 2025-05-07T19:46:07.4884941Z 2025-05-07T19:46:07.4884945Z 2025-05-07T19:46:07.4884949Z 2025-05-07T19:46:07.4884952Z 2025-05-07T19:46:07.4884956Z 2025-05-07T19:46:07.4884959Z 2025-05-07T19:46:07.4884963Z 2025-05-07T19:46:07.4884966Z 2025-05-07T19:46:07.4884985Z 2025-05-07T19:46:07.4884989Z 2025-05-07T19:46:07.4884992Z 2025-05-07T19:46:07.4885201Z  2025-05-07T19:46:07.4885442Z 2025-05-07T19:46:07.4885449Z 2025-05-07T19:46:07.4885453Z 2025-05-07T19:46:07.4885456Z 2025-05-07T19:46:07.4885460Z 2025-05-07T19:46:07.4885463Z 2025-05-07T19:46:07.4885467Z 2025-05-07T19:46:07.4885486Z 2025-05-07T19:46:07.4885490Z 2025-05-07T19:46:07.4885496Z 2025-05-07T19:46:07.4885500Z 2025-05-07T19:46:07.4885503Z 2025-05-07T19:46:07.4885507Z 2025-05-07T19:46:07.4885510Z 2025-05-07T19:46:07.4885728Z  2025-05-07T19:46:07.4885977Z 2025-05-07T19:46:07.4885980Z 2025-05-07T19:46:07.4886008Z 2025-05-07T19:46:07.4886012Z 2025-05-07T19:46:07.4886015Z 2025-05-07T19:46:07.4886019Z 2025-05-07T19:46:07.4886022Z 2025-05-07T19:46:07.4886025Z 2025-05-07T19:46:07.4886029Z 2025-05-07T19:46:07.4886032Z 2025-05-07T19:46:07.4886036Z 2025-05-07T19:46:07.4886039Z 2025-05-07T19:46:07.4886042Z 2025-05-07T19:46:07.4886046Z 2025-05-07T19:46:07.4886049Z 2025-05-07T19:46:07.4886280Z  2025-05-07T19:46:07.4886562Z 2025-05-07T19:46:07.4886565Z 2025-05-07T19:46:07.4886569Z 2025-05-07T19:46:07.4886573Z 2025-05-07T19:46:07.4886576Z 2025-05-07T19:46:07.4886583Z 2025-05-07T19:46:07.4886587Z 2025-05-07T19:46:07.4886590Z 2025-05-07T19:46:07.4886593Z 2025-05-07T19:46:07.4886597Z 2025-05-07T19:46:07.4886600Z 2025-05-07T19:46:07.4886604Z 2025-05-07T19:46:07.4886607Z 2025-05-07T19:46:07.4886611Z 2025-05-07T19:46:07.4886615Z 2025-05-07T19:46:07.4886618Z 2025-05-07T19:46:07.4886877Z  2025-05-07T19:46:07.4887139Z 2025-05-07T19:46:07.4887142Z 2025-05-07T19:46:07.4887146Z 2025-05-07T19:46:07.4887150Z 2025-05-07T19:46:07.4887153Z 2025-05-07T19:46:07.4887157Z 2025-05-07T19:46:07.4887161Z 2025-05-07T19:46:07.4887164Z 2025-05-07T19:46:07.4887168Z 2025-05-07T19:46:07.4887171Z 2025-05-07T19:46:07.4887174Z 2025-05-07T19:46:07.4887178Z 2025-05-07T19:46:07.4887185Z 2025-05-07T19:46:07.4887188Z 2025-05-07T19:46:07.4887192Z 2025-05-07T19:46:07.4887220Z 2025-05-07T19:46:07.4887223Z 2025-05-07T19:46:07.4887476Z  2025-05-07T19:46:07.4887842Z 2025-05-07T19:46:07.4887846Z 2025-05-07T19:46:07.4887849Z 2025-05-07T19:46:07.4887853Z 2025-05-07T19:46:07.4887856Z 2025-05-07T19:46:07.4887860Z 2025-05-07T19:46:07.4887863Z 2025-05-07T19:46:07.4887891Z 2025-05-07T19:46:07.4887894Z 2025-05-07T19:46:07.4887898Z 2025-05-07T19:46:07.4887901Z 2025-05-07T19:46:07.4887904Z 2025-05-07T19:46:07.4887908Z 2025-05-07T19:46:07.4887911Z 2025-05-07T19:46:07.4887915Z 2025-05-07T19:46:07.4887918Z 2025-05-07T19:46:07.4887922Z 2025-05-07T19:46:07.4887925Z 2025-05-07T19:46:07.4888174Z  2025-05-07T19:46:07.4888482Z 2025-05-07T19:46:07.4888486Z 2025-05-07T19:46:07.4888657Z  2025-05-07T19:46:07.4888781Z 2025-05-07T19:46:07.4888784Z 2025-05-07T19:46:07.4888922Z  2025-05-07T19:46:07.4889044Z 2025-05-07T19:46:07.4889047Z 2025-05-07T19:46:07.4889051Z 2025-05-07T19:46:07.4889257Z  2025-05-07T19:46:07.4889381Z 2025-05-07T19:46:07.4889385Z 2025-05-07T19:46:07.4889389Z 2025-05-07T19:46:07.4889393Z 2025-05-07T19:46:07.4889509Z  2025-05-07T19:46:07.4889665Z 2025-05-07T19:46:07.4889669Z 2025-05-07T19:46:07.4889672Z 2025-05-07T19:46:07.4889676Z 2025-05-07T19:46:07.4889679Z 2025-05-07T19:46:07.4889804Z  2025-05-07T19:46:07.4889940Z 2025-05-07T19:46:07.4889943Z 2025-05-07T19:46:07.4889970Z 2025-05-07T19:46:07.4889974Z 2025-05-07T19:46:07.4889977Z 2025-05-07T19:46:07.4889981Z 2025-05-07T19:46:07.4890103Z  2025-05-07T19:46:07.4890246Z 2025-05-07T19:46:07.4890249Z 2025-05-07T19:46:07.4890253Z 2025-05-07T19:46:07.4890256Z 2025-05-07T19:46:07.4890260Z 2025-05-07T19:46:07.4890263Z 2025-05-07T19:46:07.4890270Z 2025-05-07T19:46:07.4890424Z  2025-05-07T19:46:07.4890579Z 2025-05-07T19:46:07.4890583Z 2025-05-07T19:46:07.4890587Z 2025-05-07T19:46:07.4890590Z 2025-05-07T19:46:07.4890594Z 2025-05-07T19:46:07.4890601Z 2025-05-07T19:46:07.4890604Z 2025-05-07T19:46:07.4890608Z 2025-05-07T19:46:07.4890769Z  2025-05-07T19:46:07.4890934Z 2025-05-07T19:46:07.4890937Z 2025-05-07T19:46:07.4890941Z 2025-05-07T19:46:07.4890945Z 2025-05-07T19:46:07.4890948Z 2025-05-07T19:46:07.4890952Z 2025-05-07T19:46:07.4890955Z 2025-05-07T19:46:07.4890958Z 2025-05-07T19:46:07.4890962Z 2025-05-07T19:46:07.4891123Z  2025-05-07T19:46:07.4891292Z 2025-05-07T19:46:07.4891296Z 2025-05-07T19:46:07.4891299Z 2025-05-07T19:46:07.4891303Z 2025-05-07T19:46:07.4891307Z 2025-05-07T19:46:07.4891310Z 2025-05-07T19:46:07.4891314Z 2025-05-07T19:46:07.4891317Z 2025-05-07T19:46:07.4891320Z 2025-05-07T19:46:07.4891324Z 2025-05-07T19:46:07.4891498Z  2025-05-07T19:46:07.4891676Z 2025-05-07T19:46:07.4891679Z 2025-05-07T19:46:07.4891683Z 2025-05-07T19:46:07.4891686Z 2025-05-07T19:46:07.4891689Z 2025-05-07T19:46:07.4891693Z 2025-05-07T19:46:07.4891701Z 2025-05-07T19:46:07.4891704Z 2025-05-07T19:46:07.4891708Z 2025-05-07T19:46:07.4891711Z 2025-05-07T19:46:07.4891714Z 2025-05-07T19:46:07.4891885Z  2025-05-07T19:46:07.4892085Z 2025-05-07T19:46:07.4892088Z 2025-05-07T19:46:07.4892092Z 2025-05-07T19:46:07.4892096Z 2025-05-07T19:46:07.4892099Z 2025-05-07T19:46:07.4892103Z 2025-05-07T19:46:07.4892106Z 2025-05-07T19:46:07.4892110Z 2025-05-07T19:46:07.4892113Z 2025-05-07T19:46:07.4892117Z 2025-05-07T19:46:07.4892120Z 2025-05-07T19:46:07.4892124Z 2025-05-07T19:46:07.4892294Z  2025-05-07T19:46:07.4892496Z 2025-05-07T19:46:07.4892499Z 2025-05-07T19:46:07.4892503Z 2025-05-07T19:46:07.4892507Z 2025-05-07T19:46:07.4892510Z 2025-05-07T19:46:07.4892513Z 2025-05-07T19:46:07.4892521Z 2025-05-07T19:46:07.4892524Z 2025-05-07T19:46:07.4892528Z 2025-05-07T19:46:07.4892531Z 2025-05-07T19:46:07.4892535Z 2025-05-07T19:46:07.4892538Z 2025-05-07T19:46:07.4892541Z 2025-05-07T19:46:07.4894030Z  2025-05-07T19:46:07.4894245Z 2025-05-07T19:46:07.4894249Z 2025-05-07T19:46:07.4894252Z 2025-05-07T19:46:07.4894255Z 2025-05-07T19:46:07.4894259Z 2025-05-07T19:46:07.4894262Z 2025-05-07T19:46:07.4894266Z 2025-05-07T19:46:07.4894269Z 2025-05-07T19:46:07.4894273Z 2025-05-07T19:46:07.4894276Z 2025-05-07T19:46:07.4894279Z 2025-05-07T19:46:07.4894283Z 2025-05-07T19:46:07.4894286Z 2025-05-07T19:46:07.4894315Z 2025-05-07T19:46:07.4894477Z  2025-05-07T19:46:07.4894692Z 2025-05-07T19:46:07.4894696Z 2025-05-07T19:46:07.4894699Z 2025-05-07T19:46:07.4894703Z 2025-05-07T19:46:07.4894706Z 2025-05-07T19:46:07.4894709Z 2025-05-07T19:46:07.4894713Z 2025-05-07T19:46:07.4894716Z 2025-05-07T19:46:07.4894785Z 2025-05-07T19:46:07.4894814Z 2025-05-07T19:46:07.4894818Z 2025-05-07T19:46:07.4894821Z 2025-05-07T19:46:07.4894825Z 2025-05-07T19:46:07.4894828Z 2025-05-07T19:46:07.4894831Z 2025-05-07T19:46:07.4894998Z  2025-05-07T19:46:07.4895218Z 2025-05-07T19:46:07.4895222Z 2025-05-07T19:46:07.4895225Z 2025-05-07T19:46:07.4895229Z 2025-05-07T19:46:07.4895232Z 2025-05-07T19:46:07.4895260Z 2025-05-07T19:46:07.4895263Z 2025-05-07T19:46:07.4895267Z 2025-05-07T19:46:07.4895270Z 2025-05-07T19:46:07.4895274Z 2025-05-07T19:46:07.4895277Z 2025-05-07T19:46:07.4895280Z 2025-05-07T19:46:07.4895284Z 2025-05-07T19:46:07.4895288Z 2025-05-07T19:46:07.4895291Z 2025-05-07T19:46:07.4895294Z 2025-05-07T19:46:07.4895463Z  2025-05-07T19:46:07.4895718Z 2025-05-07T19:46:07.4895722Z 2025-05-07T19:46:07.4895725Z 2025-05-07T19:46:07.4895729Z 2025-05-07T19:46:07.4895732Z 2025-05-07T19:46:07.4895736Z 2025-05-07T19:46:07.4895743Z 2025-05-07T19:46:07.4895747Z 2025-05-07T19:46:07.4895751Z 2025-05-07T19:46:07.4895754Z 2025-05-07T19:46:07.4895758Z 2025-05-07T19:46:07.4895762Z 2025-05-07T19:46:07.4895765Z 2025-05-07T19:46:07.4895772Z 2025-05-07T19:46:07.4895776Z 2025-05-07T19:46:07.4895779Z 2025-05-07T19:46:07.4895783Z 2025-05-07T19:46:07.4895979Z  2025-05-07T19:46:07.4896207Z 2025-05-07T19:46:07.4896211Z 2025-05-07T19:46:07.4896215Z 2025-05-07T19:46:07.4896218Z 2025-05-07T19:46:07.4896221Z 2025-05-07T19:46:07.4896225Z 2025-05-07T19:46:07.4896228Z 2025-05-07T19:46:07.4896232Z 2025-05-07T19:46:07.4896235Z 2025-05-07T19:46:07.4896239Z 2025-05-07T19:46:07.4896242Z 2025-05-07T19:46:07.4896245Z 2025-05-07T19:46:07.4896249Z 2025-05-07T19:46:07.4896252Z 2025-05-07T19:46:07.4896256Z 2025-05-07T19:46:07.4896259Z 2025-05-07T19:46:07.4896263Z 2025-05-07T19:46:07.4896291Z 2025-05-07T19:46:07.4896562Z  2025-05-07T19:46:07.4896831Z 2025-05-07T19:46:07.4896835Z 2025-05-07T19:46:07.4896956Z  2025-05-07T19:46:07.4897071Z 2025-05-07T19:46:07.4897075Z 2025-05-07T19:46:07.4897212Z  2025-05-07T19:46:07.4897329Z 2025-05-07T19:46:07.4897336Z 2025-05-07T19:46:07.4897340Z 2025-05-07T19:46:07.4897453Z  2025-05-07T19:46:07.4897597Z 2025-05-07T19:46:07.4897601Z 2025-05-07T19:46:07.4897604Z 2025-05-07T19:46:07.4897608Z 2025-05-07T19:46:07.4897725Z  2025-05-07T19:46:07.4897854Z 2025-05-07T19:46:07.4897857Z 2025-05-07T19:46:07.4897861Z 2025-05-07T19:46:07.4897864Z 2025-05-07T19:46:07.4897868Z 2025-05-07T19:46:07.4898014Z  2025-05-07T19:46:07.4898151Z 2025-05-07T19:46:07.4898155Z 2025-05-07T19:46:07.4898158Z 2025-05-07T19:46:07.4898161Z 2025-05-07T19:46:07.4898165Z 2025-05-07T19:46:07.4898168Z 2025-05-07T19:46:07.4898291Z  2025-05-07T19:46:07.4898468Z 2025-05-07T19:46:07.4898472Z 2025-05-07T19:46:07.4898476Z 2025-05-07T19:46:07.4898480Z 2025-05-07T19:46:07.4898486Z 2025-05-07T19:46:07.4898490Z 2025-05-07T19:46:07.4898493Z 2025-05-07T19:46:07.4898618Z  2025-05-07T19:46:07.4898798Z 2025-05-07T19:46:07.4898801Z 2025-05-07T19:46:07.4898865Z 2025-05-07T19:46:07.4898869Z 2025-05-07T19:46:07.4898872Z 2025-05-07T19:46:07.4898876Z 2025-05-07T19:46:07.4898879Z 2025-05-07T19:46:07.4898882Z 2025-05-07T19:46:07.4899013Z  2025-05-07T19:46:07.4899199Z 2025-05-07T19:46:07.4899202Z 2025-05-07T19:46:07.4899206Z 2025-05-07T19:46:07.4899210Z 2025-05-07T19:46:07.4899213Z 2025-05-07T19:46:07.4899217Z 2025-05-07T19:46:07.4899220Z 2025-05-07T19:46:07.4899224Z 2025-05-07T19:46:07.4899227Z 2025-05-07T19:46:07.4899476Z  2025-05-07T19:46:07.4899669Z 2025-05-07T19:46:07.4899672Z 2025-05-07T19:46:07.4899676Z 2025-05-07T19:46:07.4899679Z 2025-05-07T19:46:07.4899683Z 2025-05-07T19:46:07.4899686Z 2025-05-07T19:46:07.4899690Z 2025-05-07T19:46:07.4899693Z 2025-05-07T19:46:07.4899774Z 2025-05-07T19:46:07.4899799Z 2025-05-07T19:46:07.4899930Z  2025-05-07T19:46:07.4900097Z 2025-05-07T19:46:07.4900101Z 2025-05-07T19:46:07.4900104Z 2025-05-07T19:46:07.4900107Z 2025-05-07T19:46:07.4900114Z 2025-05-07T19:46:07.4900118Z 2025-05-07T19:46:07.4900121Z 2025-05-07T19:46:07.4900125Z 2025-05-07T19:46:07.4900128Z 2025-05-07T19:46:07.4900150Z 2025-05-07T19:46:07.4900154Z 2025-05-07T19:46:07.4900283Z  2025-05-07T19:46:07.4900462Z 2025-05-07T19:46:07.4900466Z 2025-05-07T19:46:07.4900469Z 2025-05-07T19:46:07.4900473Z 2025-05-07T19:46:07.4900476Z 2025-05-07T19:46:07.4900479Z 2025-05-07T19:46:07.4900483Z 2025-05-07T19:46:07.4900486Z 2025-05-07T19:46:07.4900489Z 2025-05-07T19:46:07.4900512Z 2025-05-07T19:46:07.4900515Z 2025-05-07T19:46:07.4900518Z 2025-05-07T19:46:07.4900654Z  2025-05-07T19:46:07.4900845Z 2025-05-07T19:46:07.4900849Z 2025-05-07T19:46:07.4900852Z 2025-05-07T19:46:07.4900859Z 2025-05-07T19:46:07.4900863Z 2025-05-07T19:46:07.4900866Z 2025-05-07T19:46:07.4900870Z 2025-05-07T19:46:07.4900873Z 2025-05-07T19:46:07.4900906Z 2025-05-07T19:46:07.4900910Z 2025-05-07T19:46:07.4900913Z 2025-05-07T19:46:07.4900919Z 2025-05-07T19:46:07.4900923Z 2025-05-07T19:46:07.4901064Z  2025-05-07T19:46:07.4901257Z 2025-05-07T19:46:07.4901261Z 2025-05-07T19:46:07.4901264Z 2025-05-07T19:46:07.4901267Z 2025-05-07T19:46:07.4901271Z 2025-05-07T19:46:07.4901275Z 2025-05-07T19:46:07.4901298Z 2025-05-07T19:46:07.4901301Z 2025-05-07T19:46:07.4901304Z 2025-05-07T19:46:07.4901307Z 2025-05-07T19:46:07.4901311Z 2025-05-07T19:46:07.4901314Z 2025-05-07T19:46:07.4901317Z 2025-05-07T19:46:07.4901321Z 2025-05-07T19:46:07.4901470Z  2025-05-07T19:46:07.4901667Z 2025-05-07T19:46:07.4901671Z 2025-05-07T19:46:07.4901693Z 2025-05-07T19:46:07.4901696Z 2025-05-07T19:46:07.4901700Z 2025-05-07T19:46:07.4901703Z 2025-05-07T19:46:07.4901709Z 2025-05-07T19:46:07.4901713Z 2025-05-07T19:46:07.4901716Z 2025-05-07T19:46:07.4901719Z 2025-05-07T19:46:07.4901723Z 2025-05-07T19:46:07.4901726Z 2025-05-07T19:46:07.4901729Z 2025-05-07T19:46:07.4901736Z 2025-05-07T19:46:07.4901740Z 2025-05-07T19:46:07.4901892Z  2025-05-07T19:46:07.4902117Z 2025-05-07T19:46:07.4902121Z 2025-05-07T19:46:07.4902124Z 2025-05-07T19:46:07.4902127Z 2025-05-07T19:46:07.4902131Z 2025-05-07T19:46:07.4902134Z 2025-05-07T19:46:07.4902138Z 2025-05-07T19:46:07.4902141Z 2025-05-07T19:46:07.4902145Z 2025-05-07T19:46:07.4902148Z 2025-05-07T19:46:07.4902151Z 2025-05-07T19:46:07.4902155Z 2025-05-07T19:46:07.4902158Z 2025-05-07T19:46:07.4902161Z 2025-05-07T19:46:07.4902164Z 2025-05-07T19:46:07.4902168Z 2025-05-07T19:46:07.4902344Z  2025-05-07T19:46:07.4902556Z 2025-05-07T19:46:07.4902560Z 2025-05-07T19:46:07.4902564Z 2025-05-07T19:46:07.4902567Z 2025-05-07T19:46:07.4902573Z 2025-05-07T19:46:07.4902577Z 2025-05-07T19:46:07.4902580Z 2025-05-07T19:46:07.4902583Z 2025-05-07T19:46:07.4902587Z 2025-05-07T19:46:07.4902590Z 2025-05-07T19:46:07.4902593Z 2025-05-07T19:46:07.4902696Z 2025-05-07T19:46:07.4902700Z 2025-05-07T19:46:07.4902703Z 2025-05-07T19:46:07.4902707Z 2025-05-07T19:46:07.4902710Z 2025-05-07T19:46:07.4902732Z 2025-05-07T19:46:07.4902894Z  2025-05-07T19:46:07.4903109Z 2025-05-07T19:46:07.4903113Z 2025-05-07T19:46:07.4903116Z 2025-05-07T19:46:07.4903119Z 2025-05-07T19:46:07.4903122Z 2025-05-07T19:46:07.4903126Z 2025-05-07T19:46:07.4903129Z 2025-05-07T19:46:07.4903132Z 2025-05-07T19:46:07.4903136Z 2025-05-07T19:46:07.4903139Z 2025-05-07T19:46:07.4903162Z 2025-05-07T19:46:07.4903165Z 2025-05-07T19:46:07.4903169Z 2025-05-07T19:46:07.4903172Z 2025-05-07T19:46:07.4903175Z 2025-05-07T19:46:07.4903179Z 2025-05-07T19:46:07.4903182Z 2025-05-07T19:46:07.4903185Z 2025-05-07T19:46:07.4903587Z  2025-05-07T19:46:07.4904000Z 2025-05-07T19:46:07.4904004Z 2025-05-07T19:46:07.4904146Z  2025-05-07T19:46:07.4904258Z 2025-05-07T19:46:07.4904261Z 2025-05-07T19:46:07.4904370Z  2025-05-07T19:46:07.4904504Z 2025-05-07T19:46:07.4904508Z 2025-05-07T19:46:07.4904512Z 2025-05-07T19:46:07.4904619Z  2025-05-07T19:46:07.4904754Z 2025-05-07T19:46:07.4904758Z 2025-05-07T19:46:07.4904762Z 2025-05-07T19:46:07.4904766Z 2025-05-07T19:46:07.4904894Z  2025-05-07T19:46:07.4905016Z 2025-05-07T19:46:07.4905019Z 2025-05-07T19:46:07.4905023Z 2025-05-07T19:46:07.4905026Z 2025-05-07T19:46:07.4905030Z 2025-05-07T19:46:07.4905146Z  2025-05-07T19:46:07.4905293Z 2025-05-07T19:46:07.4905297Z 2025-05-07T19:46:07.4905300Z 2025-05-07T19:46:07.4905304Z 2025-05-07T19:46:07.4905308Z 2025-05-07T19:46:07.4905311Z 2025-05-07T19:46:07.4905424Z  2025-05-07T19:46:07.4905579Z 2025-05-07T19:46:07.4905586Z 2025-05-07T19:46:07.4905589Z 2025-05-07T19:46:07.4905593Z 2025-05-07T19:46:07.4905596Z 2025-05-07T19:46:07.4905600Z 2025-05-07T19:46:07.4905603Z 2025-05-07T19:46:07.4905723Z  2025-05-07T19:46:07.4905872Z 2025-05-07T19:46:07.4905875Z 2025-05-07T19:46:07.4905879Z 2025-05-07T19:46:07.4905883Z 2025-05-07T19:46:07.4905907Z 2025-05-07T19:46:07.4905910Z 2025-05-07T19:46:07.4905913Z 2025-05-07T19:46:07.4905917Z 2025-05-07T19:46:07.4906038Z  2025-05-07T19:46:07.4906193Z 2025-05-07T19:46:07.4906197Z 2025-05-07T19:46:07.4906201Z 2025-05-07T19:46:07.4906205Z 2025-05-07T19:46:07.4906209Z 2025-05-07T19:46:07.4906212Z 2025-05-07T19:46:07.4906232Z 2025-05-07T19:46:07.4906343Z 2025-05-07T19:46:07.4906347Z 2025-05-07T19:46:07.4906472Z  2025-05-07T19:46:07.4906642Z 2025-05-07T19:46:07.4906645Z 2025-05-07T19:46:07.4906650Z 2025-05-07T19:46:07.4906653Z 2025-05-07T19:46:07.4906656Z 2025-05-07T19:46:07.4906660Z 2025-05-07T19:46:07.4906667Z 2025-05-07T19:46:07.4906686Z 2025-05-07T19:46:07.4906690Z 2025-05-07T19:46:07.4906693Z 2025-05-07T19:46:07.4906827Z  2025-05-07T19:46:07.4906995Z 2025-05-07T19:46:07.4906999Z 2025-05-07T19:46:07.4907006Z 2025-05-07T19:46:07.4907009Z 2025-05-07T19:46:07.4907013Z 2025-05-07T19:46:07.4907016Z 2025-05-07T19:46:07.4907020Z 2025-05-07T19:46:07.4907023Z 2025-05-07T19:46:07.4907042Z 2025-05-07T19:46:07.4907046Z 2025-05-07T19:46:07.4907049Z 2025-05-07T19:46:07.4907181Z  2025-05-07T19:46:07.4907362Z 2025-05-07T19:46:07.4907365Z 2025-05-07T19:46:07.4907369Z 2025-05-07T19:46:07.4907373Z 2025-05-07T19:46:07.4907376Z 2025-05-07T19:46:07.4907379Z 2025-05-07T19:46:07.4907383Z 2025-05-07T19:46:07.4907386Z 2025-05-07T19:46:07.4907405Z 2025-05-07T19:46:07.4907409Z 2025-05-07T19:46:07.4907412Z 2025-05-07T19:46:07.4907416Z 2025-05-07T19:46:07.4907550Z  2025-05-07T19:46:07.4907742Z 2025-05-07T19:46:07.4907749Z 2025-05-07T19:46:07.4907752Z 2025-05-07T19:46:07.4907756Z 2025-05-07T19:46:07.4907759Z 2025-05-07T19:46:07.4907763Z 2025-05-07T19:46:07.4907766Z 2025-05-07T19:46:07.4907787Z 2025-05-07T19:46:07.4907856Z 2025-05-07T19:46:07.4907860Z 2025-05-07T19:46:07.4907864Z 2025-05-07T19:46:07.4907867Z 2025-05-07T19:46:07.4907871Z 2025-05-07T19:46:07.4908010Z  2025-05-07T19:46:07.4908206Z 2025-05-07T19:46:07.4908209Z 2025-05-07T19:46:07.4908212Z 2025-05-07T19:46:07.4908216Z 2025-05-07T19:46:07.4908235Z 2025-05-07T19:46:07.4908239Z 2025-05-07T19:46:07.4908242Z 2025-05-07T19:46:07.4908246Z 2025-05-07T19:46:07.4908249Z 2025-05-07T19:46:07.4908252Z 2025-05-07T19:46:07.4908256Z 2025-05-07T19:46:07.4908259Z 2025-05-07T19:46:07.4908263Z 2025-05-07T19:46:07.4908266Z 2025-05-07T19:46:07.4908410Z  2025-05-07T19:46:07.4908614Z 2025-05-07T19:46:07.4908634Z 2025-05-07T19:46:07.4908637Z 2025-05-07T19:46:07.4908955Z 2025-05-07T19:46:07.4908959Z 2025-05-07T19:46:07.4908963Z 2025-05-07T19:46:07.4908967Z 2025-05-07T19:46:07.4908971Z 2025-05-07T19:46:07.4908974Z 2025-05-07T19:46:07.4908978Z 2025-05-07T19:46:07.4908981Z 2025-05-07T19:46:07.4908988Z 2025-05-07T19:46:07.4908992Z 2025-05-07T19:46:07.4908995Z 2025-05-07T19:46:07.4908999Z 2025-05-07T19:46:07.4909157Z  2025-05-07T19:46:07.4909386Z 2025-05-07T19:46:07.4909390Z 2025-05-07T19:46:07.4909394Z 2025-05-07T19:46:07.4909397Z 2025-05-07T19:46:07.4909401Z 2025-05-07T19:46:07.4909404Z 2025-05-07T19:46:07.4909407Z 2025-05-07T19:46:07.4909411Z 2025-05-07T19:46:07.4909415Z 2025-05-07T19:46:07.4909418Z 2025-05-07T19:46:07.4909421Z 2025-05-07T19:46:07.4909425Z 2025-05-07T19:46:07.4909428Z 2025-05-07T19:46:07.4909432Z 2025-05-07T19:46:07.4909436Z 2025-05-07T19:46:07.4909439Z 2025-05-07T19:46:07.4909611Z  2025-05-07T19:46:07.4909824Z 2025-05-07T19:46:07.4909831Z 2025-05-07T19:46:07.4909835Z 2025-05-07T19:46:07.4909838Z 2025-05-07T19:46:07.4909842Z 2025-05-07T19:46:07.4909846Z 2025-05-07T19:46:07.4909849Z 2025-05-07T19:46:07.4909853Z 2025-05-07T19:46:07.4909857Z 2025-05-07T19:46:07.4909863Z 2025-05-07T19:46:07.4909867Z 2025-05-07T19:46:07.4909870Z 2025-05-07T19:46:07.4909874Z 2025-05-07T19:46:07.4909877Z 2025-05-07T19:46:07.4909881Z 2025-05-07T19:46:07.4909901Z 2025-05-07T19:46:07.4909904Z 2025-05-07T19:46:07.4910065Z  2025-05-07T19:46:07.4910283Z 2025-05-07T19:46:07.4910286Z 2025-05-07T19:46:07.4910290Z 2025-05-07T19:46:07.4910294Z 2025-05-07T19:46:07.4910297Z 2025-05-07T19:46:07.4910300Z 2025-05-07T19:46:07.4910304Z 2025-05-07T19:46:07.4910307Z 2025-05-07T19:46:07.4910327Z 2025-05-07T19:46:07.4910330Z 2025-05-07T19:46:07.4910334Z 2025-05-07T19:46:07.4910337Z 2025-05-07T19:46:07.4910341Z 2025-05-07T19:46:07.4910344Z 2025-05-07T19:46:07.4910348Z 2025-05-07T19:46:07.4910352Z 2025-05-07T19:46:07.4910359Z 2025-05-07T19:46:07.4910362Z 2025-05-07T19:46:07.4910529Z  2025-05-07T19:46:07.4910753Z 2025-05-07T19:46:07.4910772Z 2025-05-07T19:46:07.4910879Z  2025-05-07T19:46:07.4910986Z 2025-05-07T19:46:07.4910989Z 2025-05-07T19:46:07.4911091Z  2025-05-07T19:46:07.4911219Z 2025-05-07T19:46:07.4911222Z 2025-05-07T19:46:07.4911226Z 2025-05-07T19:46:07.4911331Z  2025-05-07T19:46:07.4911445Z 2025-05-07T19:46:07.4911449Z 2025-05-07T19:46:07.4911452Z 2025-05-07T19:46:07.4911456Z 2025-05-07T19:46:07.4911580Z  2025-05-07T19:46:07.4911698Z 2025-05-07T19:46:07.4911702Z 2025-05-07T19:46:07.4911706Z 2025-05-07T19:46:07.4911709Z 2025-05-07T19:46:07.4911712Z 2025-05-07T19:46:07.4911840Z  2025-05-07T19:46:07.4911967Z 2025-05-07T19:46:07.4911970Z 2025-05-07T19:46:07.4911974Z 2025-05-07T19:46:07.4911977Z 2025-05-07T19:46:07.4911981Z 2025-05-07T19:46:07.4911985Z 2025-05-07T19:46:07.4912103Z  2025-05-07T19:46:07.4912257Z 2025-05-07T19:46:07.4912261Z 2025-05-07T19:46:07.4912265Z 2025-05-07T19:46:07.4912268Z 2025-05-07T19:46:07.4912271Z 2025-05-07T19:46:07.4912275Z 2025-05-07T19:46:07.4912355Z 2025-05-07T19:46:07.4912475Z  2025-05-07T19:46:07.4912622Z 2025-05-07T19:46:07.4912626Z 2025-05-07T19:46:07.4912646Z 2025-05-07T19:46:07.4912650Z 2025-05-07T19:46:07.4912653Z 2025-05-07T19:46:07.4912657Z 2025-05-07T19:46:07.4912661Z 2025-05-07T19:46:07.4912664Z 2025-05-07T19:46:07.4912791Z  2025-05-07T19:46:07.4912946Z 2025-05-07T19:46:07.4912950Z 2025-05-07T19:46:07.4912953Z 2025-05-07T19:46:07.4912957Z 2025-05-07T19:46:07.4912960Z 2025-05-07T19:46:07.4912981Z 2025-05-07T19:46:07.4912985Z 2025-05-07T19:46:07.4912988Z 2025-05-07T19:46:07.4912992Z 2025-05-07T19:46:07.4913115Z  2025-05-07T19:46:07.4913279Z 2025-05-07T19:46:07.4913283Z 2025-05-07T19:46:07.4913286Z 2025-05-07T19:46:07.4913290Z 2025-05-07T19:46:07.4913349Z 2025-05-07T19:46:07.4913354Z 2025-05-07T19:46:07.4913357Z 2025-05-07T19:46:07.4913378Z 2025-05-07T19:46:07.4913382Z 2025-05-07T19:46:07.4913385Z 2025-05-07T19:46:07.4913519Z  2025-05-07T19:46:07.4913692Z 2025-05-07T19:46:07.4913696Z 2025-05-07T19:46:07.4913699Z 2025-05-07T19:46:07.4913703Z 2025-05-07T19:46:07.4913706Z 2025-05-07T19:46:07.4913710Z 2025-05-07T19:46:07.4913713Z 2025-05-07T19:46:07.4913716Z 2025-05-07T19:46:07.4913737Z 2025-05-07T19:46:07.4913741Z 2025-05-07T19:46:07.4913745Z 2025-05-07T19:46:07.4913878Z  2025-05-07T19:46:07.4914061Z 2025-05-07T19:46:07.4914066Z 2025-05-07T19:46:07.4914070Z 2025-05-07T19:46:07.4914073Z 2025-05-07T19:46:07.4914077Z 2025-05-07T19:46:07.4914081Z 2025-05-07T19:46:07.4914085Z 2025-05-07T19:46:07.4914088Z 2025-05-07T19:46:07.4914111Z 2025-05-07T19:46:07.4914115Z 2025-05-07T19:46:07.4914118Z 2025-05-07T19:46:07.4914121Z 2025-05-07T19:46:07.4914265Z  2025-05-07T19:46:07.4914454Z 2025-05-07T19:46:07.4914457Z 2025-05-07T19:46:07.4914461Z 2025-05-07T19:46:07.4914464Z 2025-05-07T19:46:07.4914467Z 2025-05-07T19:46:07.4914471Z 2025-05-07T19:46:07.4914478Z 2025-05-07T19:46:07.4914499Z 2025-05-07T19:46:07.4914502Z 2025-05-07T19:46:07.4914506Z 2025-05-07T19:46:07.4914509Z 2025-05-07T19:46:07.4914512Z 2025-05-07T19:46:07.4914516Z 2025-05-07T19:46:07.4914658Z  2025-05-07T19:46:07.4914854Z 2025-05-07T19:46:07.4914858Z 2025-05-07T19:46:07.4914861Z 2025-05-07T19:46:07.4914864Z 2025-05-07T19:46:07.4914884Z 2025-05-07T19:46:07.4914887Z 2025-05-07T19:46:07.4914891Z 2025-05-07T19:46:07.4914894Z 2025-05-07T19:46:07.4914897Z 2025-05-07T19:46:07.4914901Z 2025-05-07T19:46:07.4914904Z 2025-05-07T19:46:07.4914907Z 2025-05-07T19:46:07.4914911Z 2025-05-07T19:46:07.4914914Z 2025-05-07T19:46:07.4915060Z  2025-05-07T19:46:07.4915266Z 2025-05-07T19:46:07.4915290Z 2025-05-07T19:46:07.4915294Z 2025-05-07T19:46:07.4915297Z 2025-05-07T19:46:07.4915301Z 2025-05-07T19:46:07.4915304Z 2025-05-07T19:46:07.4915308Z 2025-05-07T19:46:07.4915311Z 2025-05-07T19:46:07.4915318Z 2025-05-07T19:46:07.4915321Z 2025-05-07T19:46:07.4915325Z 2025-05-07T19:46:07.4915328Z 2025-05-07T19:46:07.4915331Z 2025-05-07T19:46:07.4915335Z 2025-05-07T19:46:07.4915339Z 2025-05-07T19:46:07.4915490Z  2025-05-07T19:46:07.4915717Z 2025-05-07T19:46:07.4915720Z 2025-05-07T19:46:07.4915724Z 2025-05-07T19:46:07.4915728Z 2025-05-07T19:46:07.4915731Z 2025-05-07T19:46:07.4915735Z 2025-05-07T19:46:07.4915738Z 2025-05-07T19:46:07.4915741Z 2025-05-07T19:46:07.4915745Z 2025-05-07T19:46:07.4915748Z 2025-05-07T19:46:07.4915752Z 2025-05-07T19:46:07.4915755Z 2025-05-07T19:46:07.4915759Z 2025-05-07T19:46:07.4915762Z 2025-05-07T19:46:07.4915766Z 2025-05-07T19:46:07.4915769Z 2025-05-07T19:46:07.4915945Z  2025-05-07T19:46:07.4916157Z 2025-05-07T19:46:07.4916161Z 2025-05-07T19:46:07.4916164Z 2025-05-07T19:46:07.4916168Z 2025-05-07T19:46:07.4916172Z 2025-05-07T19:46:07.4916175Z 2025-05-07T19:46:07.4916178Z 2025-05-07T19:46:07.4916238Z 2025-05-07T19:46:07.4916241Z 2025-05-07T19:46:07.4916245Z 2025-05-07T19:46:07.4916248Z 2025-05-07T19:46:07.4916252Z 2025-05-07T19:46:07.4916255Z 2025-05-07T19:46:07.4916259Z 2025-05-07T19:46:07.4916262Z 2025-05-07T19:46:07.4916284Z 2025-05-07T19:46:07.4916287Z 2025-05-07T19:46:07.4916448Z  2025-05-07T19:46:07.4916671Z 2025-05-07T19:46:07.4916674Z 2025-05-07T19:46:07.4916677Z 2025-05-07T19:46:07.4916681Z 2025-05-07T19:46:07.4916684Z 2025-05-07T19:46:07.4916688Z 2025-05-07T19:46:07.4916691Z 2025-05-07T19:46:07.4916694Z 2025-05-07T19:46:07.4916715Z 2025-05-07T19:46:07.4916719Z 2025-05-07T19:46:07.4916722Z 2025-05-07T19:46:07.4916726Z 2025-05-07T19:46:07.4916729Z 2025-05-07T19:46:07.4916733Z 2025-05-07T19:46:07.4916790Z 2025-05-07T19:46:07.4916795Z 2025-05-07T19:46:07.4916798Z 2025-05-07T19:46:07.4916802Z 2025-05-07T19:46:07.4916973Z  2025-05-07T19:46:07.4917202Z 2025-05-07T19:46:07.4917222Z 2025-05-07T19:46:07.4917321Z  2025-05-07T19:46:07.4917429Z 2025-05-07T19:46:07.4917433Z 2025-05-07T19:46:07.4917534Z  2025-05-07T19:46:07.4917661Z 2025-05-07T19:46:07.4917665Z 2025-05-07T19:46:07.4917668Z 2025-05-07T19:46:07.4917773Z  2025-05-07T19:46:07.4917884Z 2025-05-07T19:46:07.4917887Z 2025-05-07T19:46:07.4917891Z 2025-05-07T19:46:07.4917895Z 2025-05-07T19:46:07.4918016Z  2025-05-07T19:46:07.4918138Z 2025-05-07T19:46:07.4918141Z 2025-05-07T19:46:07.4918145Z 2025-05-07T19:46:07.4918148Z 2025-05-07T19:46:07.4918151Z 2025-05-07T19:46:07.4918283Z  2025-05-07T19:46:07.4918409Z 2025-05-07T19:46:07.4918413Z 2025-05-07T19:46:07.4918416Z 2025-05-07T19:46:07.4918420Z 2025-05-07T19:46:07.4918427Z 2025-05-07T19:46:07.4918430Z 2025-05-07T19:46:07.4918543Z  2025-05-07T19:46:07.4918692Z 2025-05-07T19:46:07.4918696Z 2025-05-07T19:46:07.4918699Z 2025-05-07T19:46:07.4918703Z 2025-05-07T19:46:07.4918709Z 2025-05-07T19:46:07.4918713Z 2025-05-07T19:46:07.4918717Z 2025-05-07T19:46:07.4918834Z  2025-05-07T19:46:07.4918979Z 2025-05-07T19:46:07.4918984Z 2025-05-07T19:46:07.4919003Z 2025-05-07T19:46:07.4919007Z 2025-05-07T19:46:07.4919010Z 2025-05-07T19:46:07.4919014Z 2025-05-07T19:46:07.4919017Z 2025-05-07T19:46:07.4919021Z 2025-05-07T19:46:07.4919140Z  2025-05-07T19:46:07.4919292Z 2025-05-07T19:46:07.4919295Z 2025-05-07T19:46:07.4919299Z 2025-05-07T19:46:07.4919302Z 2025-05-07T19:46:07.4919306Z 2025-05-07T19:46:07.4919326Z 2025-05-07T19:46:07.4919329Z 2025-05-07T19:46:07.4919332Z 2025-05-07T19:46:07.4919336Z 2025-05-07T19:46:07.4919459Z  2025-05-07T19:46:07.4919621Z 2025-05-07T19:46:07.4919624Z 2025-05-07T19:46:07.4919631Z 2025-05-07T19:46:07.4919635Z 2025-05-07T19:46:07.4919638Z 2025-05-07T19:46:07.4919643Z 2025-05-07T19:46:07.4919647Z 2025-05-07T19:46:07.4919666Z 2025-05-07T19:46:07.4919670Z 2025-05-07T19:46:07.4919676Z 2025-05-07T19:46:07.4919807Z  2025-05-07T19:46:07.4919976Z 2025-05-07T19:46:07.4919980Z 2025-05-07T19:46:07.4919983Z 2025-05-07T19:46:07.4919987Z 2025-05-07T19:46:07.4919990Z 2025-05-07T19:46:07.4919994Z 2025-05-07T19:46:07.4919998Z 2025-05-07T19:46:07.4920002Z 2025-05-07T19:46:07.4920022Z 2025-05-07T19:46:07.4920025Z 2025-05-07T19:46:07.4920029Z 2025-05-07T19:46:07.4920185Z  done 2025-05-07T19:46:07.7020852Z Preparing transaction: - \ done 2025-05-07T19:46:08.5041390Z Verifying transaction: / - \ | / - \ | done 2025-05-07T19:46:08.8095551Z Executing transaction: - \ | done 2025-05-07T19:46:10.8067933Z [INSTALL] Fixing file placements for CUDA 12.8.0+ ... 2025-05-07T19:46:10.8068439Z [INSTALL] Creating symlinks: libnvToolsExt.so 2025-05-07T19:46:10.8069203Z + ln -sf /github/home/miniconda/envs/build_binary/lib/libnvToolsExt.so.1 /github/home/miniconda/envs/build_binary/lib/libnvToolsExt.so 2025-05-07T19:46:10.8070167Z 2025-05-07T19:46:10.8091233Z 2025-05-07T19:46:10.8093533Z + ln -sf /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/libnvToolsExt.so.1 /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/libnvToolsExt.so 2025-05-07T19:46:10.8095936Z 2025-05-07T19:46:10.8109895Z 2025-05-07T19:46:10.8110046Z [INSTALL] Copying nvtx3 headers ... 2025-05-07T19:46:10.8115669Z + cp -r /github/home/miniconda/envs/build_binary/nsight-compute-2025.1.0/host/target-linux-x64/nvtx/include/nvtx3/nvToolsExt.h /github/home/miniconda/envs/build_binary/nsight-compute-2025.1.0/host/target-linux-x64/nvtx/include/nvtx3/nvToolsExtCuda.h /github/home/miniconda/envs/build_binary/nsight-compute-2025.1.0/host/target-linux-x64/nvtx/include/nvtx3/nvToolsExtCudaRt.h /github/home/miniconda/envs/build_binary/nsight-compute-2025.1.0/host/target-linux-x64/nvtx/include/nvtx3/nvToolsExtOpenCL.h /github/home/miniconda/envs/build_binary/nsight-compute-2025.1.0/host/target-linux-x64/nvtx/include/nvtx3/nvToolsExtSync.h /github/home/miniconda/envs/build_binary/nsight-compute-2025.1.0/host/target-linux-x64/nvtx/include/nvtx3/nvtx3.hpp /github/home/miniconda/envs/build_binary/nsight-compute-2025.1.0/host/target-linux-x64/nvtx/include/nvtx3/nvtxDetail /github/home/miniconda/envs/build_binary/include/ 2025-05-07T19:46:10.8119878Z 2025-05-07T19:46:11.2570711Z 2025-05-07T19:46:11.2575198Z + cp -r /github/home/miniconda/envs/build_binary/nsight-compute-2025.1.0/host/target-linux-x64/nvtx/include/nvtx3/nvToolsExt.h /github/home/miniconda/envs/build_binary/nsight-compute-2025.1.0/host/target-linux-x64/nvtx/include/nvtx3/nvToolsExtCuda.h /github/home/miniconda/envs/build_binary/nsight-compute-2025.1.0/host/target-linux-x64/nvtx/include/nvtx3/nvToolsExtCudaRt.h /github/home/miniconda/envs/build_binary/nsight-compute-2025.1.0/host/target-linux-x64/nvtx/include/nvtx3/nvToolsExtOpenCL.h /github/home/miniconda/envs/build_binary/nsight-compute-2025.1.0/host/target-linux-x64/nvtx/include/nvtx3/nvToolsExtSync.h /github/home/miniconda/envs/build_binary/nsight-compute-2025.1.0/host/target-linux-x64/nvtx/include/nvtx3/nvtx3.hpp /github/home/miniconda/envs/build_binary/nsight-compute-2025.1.0/host/target-linux-x64/nvtx/include/nvtx3/nvtxDetail /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include/ 2025-05-07T19:46:11.2579569Z 2025-05-07T19:46:11.2596191Z 2025-05-07T19:46:11.2596759Z [INSTALL] Appending libcuda.so path to LD_LIBRARY_PATH ... 2025-05-07T19:46:11.3024890Z [ENV] Appending to LD_LIBRARY_PATH: /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/stubs ... 2025-05-07T19:46:13.1091879Z ERROR conda.cli.main_run:execute(125): `conda run printenv LD_LIBRARY_PATH` failed. (See above for error) 2025-05-07T19:46:13.1667290Z + conda env config vars set -n build_binary LD_LIBRARY_PATH=/github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/stubs 2025-05-07T19:46:13.1667872Z 2025-05-07T19:46:13.5813426Z 2025-05-07T19:46:13.5817838Z [INSTALL] Setting environment variable NVML_LIB_PATH ... 2025-05-07T19:46:13.6181090Z + conda env config vars set -n build_binary NVML_LIB_PATH=/github/home/miniconda/envs/build_binary/lib/stubs/libnvidia-ml.so 2025-05-07T19:46:13.6182721Z 2025-05-07T19:46:14.0302195Z 2025-05-07T19:46:14.0303047Z [INSTALL] Setting environment variable CUDA_INCLUDE_DIRS ... 2025-05-07T19:46:14.0306239Z + conda env config vars set -n build_binary CUDA_INCLUDE_DIRS="/github/home/miniconda/envs/build_binary/include/:/github/home/miniconda/envs/build_binary/targets/x86_64-linux/include/" 2025-05-07T19:46:14.0307021Z 2025-05-07T19:46:14.4402283Z 2025-05-07T19:46:16.3992205Z [CHECK] cuda_runtime.h found in CONDA_PREFIX PATH (file): /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include/cuda_runtime.h 2025-05-07T19:46:18.3865789Z [CHECK] libcuda.so found in CONDA_PREFIX PATH (file): /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/stubs/libcuda.so 2025-05-07T19:46:20.3232968Z [CHECK] libnvToolsExt.so found in CONDA_PREFIX PATH (symbolic link): /github/home/miniconda/envs/build_binary/lib/libnvToolsExt.so 2025-05-07T19:46:20.3236044Z /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/libnvToolsExt.so 2025-05-07T19:46:22.2467468Z [CHECK] libnvidia-ml.so found in CONDA_PREFIX PATH (file): /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/stubs/libnvidia-ml.so 2025-05-07T19:46:24.0826385Z /github/home/miniconda/envs/build_binary/bin/nvcc 2025-05-07T19:46:24.0826710Z 2025-05-07T19:46:24.1595147Z [CHECK] Binary nvcc found in PATH 2025-05-07T19:46:27.9018288Z /tmp/tmpol0g2paf: line 3: clang: command not found 2025-05-07T19:46:27.9018611Z 2025-05-07T19:46:27.9021788Z ERROR conda.cli.main_run:execute(125): `conda run clang --version` failed. (See above for error) 2025-05-07T19:46:27.9592798Z + ls -la /github/home/miniconda/envs/build_binary/etc/conda/activate.d 2025-05-07T19:46:27.9593215Z 2025-05-07T19:46:27.9615217Z total 56 2025-05-07T19:46:27.9616756Z drwxr-xr-x. 2 root root 16384 May 7 19:46 . 2025-05-07T19:46:27.9617810Z drwxr-xr-x. 5 root root 62 May 7 19:44 .. 2025-05-07T19:46:27.9619093Z -rw-r--r--. 2 root root 3778 Jun 10 2024 activate-binutils_linux-64.sh 2025-05-07T19:46:27.9620509Z -rw-r--r--. 2 root root 11630 Jun 10 2024 activate-gcc_linux-64.sh 2025-05-07T19:46:27.9621722Z -rw-r--r--. 2 root root 5190 Jun 10 2024 activate-gxx_linux-64.sh 2025-05-07T19:46:27.9622175Z -rw-r--r--. 2 root root 136 Mar 27 01:27 libglib_activate.sh 2025-05-07T19:46:27.9622600Z -rw-r--r--. 2 root root 872 May 7 16:10 libxml2_activate.sh 2025-05-07T19:46:27.9623045Z -rw-r--r--. 2 root root 499 Mar 28 22:35 openjdk_activate.sh 2025-05-07T19:46:27.9623637Z -rw-r--r--. 2 root root 2932 Jan 24 22:22 ~cuda-nvcc_activate.sh 2025-05-07T19:46:27.9623917Z 2025-05-07T19:46:27.9624148Z [INSTALL] Removing the -ccbin=CXX hook from NVCC activation scripts ... 2025-05-07T19:46:27.9624823Z + sed -i /-ccbin=/d /github/home/miniconda/envs/build_binary/etc/conda/activate.d/*cuda-nvcc_activate.sh 2025-05-07T19:46:27.9625270Z 2025-05-07T19:46:27.9642987Z 2025-05-07T19:46:27.9643280Z + conda run -n build_binary c++ --version | grep -i clang 2025-05-07T19:46:27.9643591Z 2025-05-07T19:46:29.8473977Z 2025-05-07T19:46:29.8474446Z [BUILD] Setting prepend flags for NVCC ... 2025-05-07T19:46:29.8475060Z + conda env config vars set -n build_binary NVCC_PREPEND_FLAGS="-allow-unsupported-compiler" 2025-05-07T19:46:29.8475464Z 2025-05-07T19:46:30.2527842Z 2025-05-07T19:46:30.2528301Z + conda run -n build_binary printenv NVCC_PREPEND_FLAGS 2025-05-07T19:46:32.0995571Z 2025-05-07T19:46:32.0995979Z -allow-unsupported-compiler 2025-05-07T19:46:32.0996258Z 2025-05-07T19:46:32.1745796Z 2025-05-07T19:46:32.1746265Z [INFO] Printing out all preprocessor defines in nvcc ... 2025-05-07T19:46:32.1746889Z + conda run -n build_binary nvcc --compiler-options -dM -E -x cu - < /dev/null 2025-05-07T19:46:32.1747232Z 2025-05-07T19:46:34.0441011Z #define _GLIBCXX_DEPRECATED_SUGGEST(ALT) __attribute__ ((__deprecated__ ("use '" ALT "' instead"))) 2025-05-07T19:46:34.0441960Z #define M_PIl 3.141592653589793238462643383279502884L 2025-05-07T19:46:34.0442371Z #define _IO_CURRENTLY_PUTTING 0x800 2025-05-07T19:46:34.0442754Z #define __W_EXITCODE(ret,sig) ((ret) << 8 | (sig)) 2025-05-07T19:46:34.0443125Z #define __DBL_MIN_EXP__ (-1021) 2025-05-07T19:46:34.0443447Z #define _STL_PAIR_H 1 2025-05-07T19:46:34.0443733Z #define __cpp_attributes 200809L 2025-05-07T19:46:34.0444072Z #define __cpp_nontype_template_parameter_auto 201606L 2025-05-07T19:46:34.0444451Z #define __DELETE_THROW throw() 2025-05-07T19:46:34.0444724Z #define _PTRDIFF_T_ 2025-05-07T19:46:34.0444980Z #define M_PI_4 0.78539816339744830962 2025-05-07T19:46:34.0445273Z #define __UINT_LEAST16_MAX__ 0xffff 2025-05-07T19:46:34.0445571Z #define _IO_LEFT 02 2025-05-07T19:46:34.0445805Z #define __ATOMIC_ACQUIRE 2 2025-05-07T19:46:34.0446103Z #define _POSIX2_BC_SCALE_MAX 99 2025-05-07T19:46:34.0446385Z #define _GLIBCXX_USE_RANDOM_TR1 1 2025-05-07T19:46:34.0446859Z #define _GLIBCXX_MOVE_BACKWARD3(_Tp,_Up,_Vp) std::move_backward(_Tp, _Up, _Vp) 2025-05-07T19:46:34.0447713Z #define __FLT128_MAX_10_EXP__ 4932 2025-05-07T19:46:34.0448005Z #define RE_DUP_MAX (0x7fff) 2025-05-07T19:46:34.0448279Z #define _IOS_OUTPUT 2 2025-05-07T19:46:34.0448520Z #define __SM_100_RT_HPP__ 2025-05-07T19:46:34.0448851Z #define __FLT_MIN__ 1.17549435082228750796873653722224568e-38F 2025-05-07T19:46:34.0449235Z #define toascii_l(c,l) __toascii_l ((c), (l)) 2025-05-07T19:46:34.0449576Z #define __GCC_IEC_559_COMPLEX 2 2025-05-07T19:46:34.0449892Z #define _GLIBCXX_USE_FCHMOD 1 2025-05-07T19:46:34.0450504Z #define __cpp_aggregate_nsdmi 201304L 2025-05-07T19:46:34.0451523Z #define __bswap_16(x) (__extension__ ({ unsigned short int __v, __x = (unsigned short int) (x); if (__builtin_constant_p (__x)) __v = __bswap_constant_16 (__x); else __asm__ ("rorw $8, %w0" : "=r" (__v) : "0" (__x) : "cc"); __v; })) 2025-05-07T19:46:34.0452592Z #define __UINT_LEAST8_TYPE__ unsigned char 2025-05-07T19:46:34.0452932Z #define __SIZEOF_FLOAT80__ 16 2025-05-07T19:46:34.0453237Z #define cudaTextureTypeCubemapLayered 0xFC 2025-05-07T19:46:34.0453582Z #define _T_WCHAR_ 2025-05-07T19:46:34.0453830Z #define stdout stdout 2025-05-07T19:46:34.0454171Z #define _GLIBCXX_ABI_TAG_CXX11 __attribute ((__abi_tag__ ("cxx11"))) 2025-05-07T19:46:34.0454589Z #define CHAR_BIT __CHAR_BIT__ 2025-05-07T19:46:34.0454864Z #define __flexarr [] 2025-05-07T19:46:34.0455141Z #define _GLIBCXX_HAVE_FINITEF 1 2025-05-07T19:46:34.0455478Z #define __islower_l(c,l) __isctype_l((c), _ISlower, (l)) 2025-05-07T19:46:34.0455867Z #define _IO_FLAGS2_USER_WBUF 8 2025-05-07T19:46:34.0456139Z #define _MATH_H 1 2025-05-07T19:46:34.0456452Z #define cudaOccupancyDisableCachingOverride 0x01 2025-05-07T19:46:34.0456825Z #define __S64_TYPE long int 2025-05-07T19:46:34.0457084Z #define __stub_fchflags 2025-05-07T19:46:34.0457412Z #define cudaDeviceScheduleMask 0x07 2025-05-07T19:46:34.0457777Z #define __SQUAD_TYPE long int 2025-05-07T19:46:34.0458073Z #define __INTMAX_C(c) c ## L 2025-05-07T19:46:34.0458449Z #define cudaStreamFireAndForget ((cudaStream_t)0x4) 2025-05-07T19:46:34.0458843Z #define _BSD_SIZE_T_DEFINED_ 2025-05-07T19:46:34.0459173Z #define NL_NMAX INT_MAX 2025-05-07T19:46:34.0459433Z #define _BITS_TIME_H 1 2025-05-07T19:46:34.0459773Z #define M_LN10l 2.302585092994045684017991454684364208L 2025-05-07T19:46:34.0460137Z #define _GLIBCXX_TXN_SAFE_DYN 2025-05-07T19:46:34.0460510Z #define cudaStreamTailLaunch ((cudaStream_t)0x3) 2025-05-07T19:46:34.0460903Z #define M_El 2.718281828459045235360287471352662498L 2025-05-07T19:46:34.0461404Z #define _PSTL_PRAGMA_DECLARE_SIMD _PSTL_PRAGMA(omp declare simd) 2025-05-07T19:46:34.0461843Z #define __CHAR_BIT__ 8 2025-05-07T19:46:34.0462131Z #define __FSWORD_T_TYPE __SYSCALL_SLONG_TYPE 2025-05-07T19:46:34.0462504Z #define _PSTL_STRING_CONCAT(x,y) x #y 2025-05-07T19:46:34.0462835Z #define _GLIBCXX98_USE_C99_MATH 1 2025-05-07T19:46:34.0463167Z #define FP_NAN 0 2025-05-07T19:46:34.0463613Z #define makedev(maj,min) gnu_dev_makedev (maj, min) 2025-05-07T19:46:34.0464089Z #define cudaGetDeviceProperties cudaGetDeviceProperties_v2 2025-05-07T19:46:34.0464583Z #define __cudaCDP2GetErrorString 2025-05-07T19:46:34.0464927Z #define SHRT_MAX __SHRT_MAX__ 2025-05-07T19:46:34.0465240Z #define _GLIBCXX_X86_RDSEED 1 2025-05-07T19:46:34.0465517Z #define __SM_80_RT_H__ 2025-05-07T19:46:34.0465790Z #define _NEW 2025-05-07T19:46:34.0466034Z #define CLOCK_PROCESS_CPUTIME_ID 2 2025-05-07T19:46:34.0466360Z #define __UINT8_MAX__ 0xff 2025-05-07T19:46:34.0466751Z #define _PSTL_ASSERT_MSG(_Condition,_Message) __glibcxx_assert(_Condition) 2025-05-07T19:46:34.0467215Z #define __SCHAR_WIDTH__ 8 2025-05-07T19:46:34.0467480Z #define __USE_ANSI 1 2025-05-07T19:46:34.0467814Z #define _IO_BE(expr,res) __builtin_expect ((expr), res) 2025-05-07T19:46:34.0468240Z #define __isupper_l(c,l) __isctype_l((c), _ISupper, (l)) 2025-05-07T19:46:34.0468655Z #define __cudaCDP2Memcpy2DAsync_ptsz 2025-05-07T19:46:34.0469008Z #define __WINT_MAX__ 0xffffffffU 2025-05-07T19:46:34.0469312Z #define __SIZEOF_PTHREAD_ATTR_T 56 2025-05-07T19:46:34.0469768Z #define __FLT32_MIN_EXP__ (-125) 2025-05-07T19:46:34.0470186Z #define _GLIBCXX_END_NAMESPACE_LDBL 2025-05-07T19:46:34.0470508Z #define PIPE_BUF 4096 2025-05-07T19:46:34.0470837Z #define _PSTL_PRAGMA_SIMD_ORDERED_MONOTONIC_2ARGS(PRM1,PRM2) 2025-05-07T19:46:34.0471318Z #define _GLIBCXX_NAMESPACE_LDBL_OR_CXX11 _GLIBCXX_NAMESPACE_CXX11 2025-05-07T19:46:34.0471699Z #define ADJ_TICK 0x4000 2025-05-07T19:46:34.0472009Z #define _PSTL_VERSION_PATCH (_PSTL_VERSION % 10) 2025-05-07T19:46:34.0472363Z #define MQ_PRIO_MAX 32768 2025-05-07T19:46:34.0472630Z #define __SIZEOF_PTHREAD_MUTEXATTR_T 4 2025-05-07T19:46:34.0472979Z #define __WAIT_INT(status) (*(int *) &(status)) 2025-05-07T19:46:34.0473444Z #define __GLIBC_PREREQ(maj,min) ((__GLIBC__ << 16) + __GLIBC_MINOR__ >= ((maj) << 16) + (min)) 2025-05-07T19:46:34.0474065Z #define cudaCooperativeLaunchMultiDeviceNoPreSync 0x01 2025-05-07T19:46:34.0474440Z #define _XOPEN_SOURCE 700 2025-05-07T19:46:34.0474742Z #define _POSIX2_BC_DIM_MAX 2048 2025-05-07T19:46:34.0475038Z #define __VECTOR_FUNCTIONS_HPP__ 2025-05-07T19:46:34.0475367Z #define __cpp_static_assert 201411L 2025-05-07T19:46:34.0475690Z #define __GLIBCXX__ 20230528 2025-05-07T19:46:34.0475966Z #define _GLIBCXX_HAVE_STRXFRM_L 1 2025-05-07T19:46:34.0476288Z #define _POSIX_TTY_NAME_MAX 9 2025-05-07T19:46:34.0476575Z #define _GLIBCXX_USE_WEAK_REF __GXX_WEAK__ 2025-05-07T19:46:34.0476915Z #define __OFF_T_MATCHES_OFF64_T 1 2025-05-07T19:46:34.0477201Z #define __ORDER_LITTLE_ENDIAN__ 1234 2025-05-07T19:46:34.0477533Z #define __SIZE_MAX__ 0xffffffffffffffffUL 2025-05-07T19:46:34.0477894Z #define __ispunct_l(c,l) __isctype_l((c), _ISpunct, (l)) 2025-05-07T19:46:34.0478268Z #define __WCHAR_MAX__ 0x7fffffff 2025-05-07T19:46:34.0478557Z #define _GLIBCXX_USE_CLOCK_MONOTONIC 1 2025-05-07T19:46:34.0478903Z #define __BLKCNT_T_TYPE __SYSCALL_SLONG_TYPE 2025-05-07T19:46:34.0479291Z #define __isprint_l(c,l) __isctype_l((c), _ISprint, (l)) 2025-05-07T19:46:34.0479652Z #define cudaNvSciSyncAttrSignal 0x1 2025-05-07T19:46:34.0479975Z #define _GLIBCXX_USE_LONG_LONG 1 2025-05-07T19:46:34.0480274Z #define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_1 1 2025-05-07T19:46:34.0480632Z #define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_2 1 2025-05-07T19:46:34.0480961Z #define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_4 1 2025-05-07T19:46:34.0481392Z #define __DBL_DENORM_MIN__ double(4.94065645841246544176568792868221372e-324L) 2025-05-07T19:46:34.0481813Z #define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_8 1 2025-05-07T19:46:34.0482152Z #define ADJ_ESTERROR 0x0008 2025-05-07T19:46:34.0482448Z #define __GCC_ATOMIC_CHAR_LOCK_FREE 2 2025-05-07T19:46:34.0482731Z #define __GCC_IEC_559 2 2025-05-07T19:46:34.0483052Z #define __cpp_lib_transformation_trait_aliases 201304 2025-05-07T19:46:34.0483391Z #define _IO_flockfile(_fp) 2025-05-07T19:46:34.0483689Z #define CLOCK_MONOTONIC_RAW 4 2025-05-07T19:46:34.0483962Z #define __FLT32X_DECIMAL_DIG__ 17 2025-05-07T19:46:34.0484251Z #define _IOFBF 0 2025-05-07T19:46:34.0484472Z #define __USE_BSD 1 2025-05-07T19:46:34.0484731Z #define __FLT_EVAL_METHOD__ 0 2025-05-07T19:46:34.0485017Z #define SHRT_MIN (-SHRT_MAX - 1) 2025-05-07T19:46:34.0485319Z #define _IO_USER_LOCK 0x8000 2025-05-07T19:46:34.0485609Z #define _IO_NO_WRITES 8 2025-05-07T19:46:34.0485871Z #define _GLIBCXX_PSEUDO_VISIBILITY(V) 2025-05-07T19:46:34.0486262Z #define __ASMNAME2(prefix,cname) __STRING (prefix) cname 2025-05-07T19:46:34.0486621Z #define _GLIBCXX_HAVE_SYS_STAT_H 1 2025-05-07T19:46:34.0486966Z #define MB_CUR_MAX (__ctype_get_mb_cur_max ()) 2025-05-07T19:46:34.0487296Z #define __cpp_binary_literals 201304L 2025-05-07T19:46:34.0487629Z #define _CPP_TYPE_TRAITS_H 1 2025-05-07T19:46:34.0487887Z #define __BEGIN_NAMESPACE_C99 2025-05-07T19:46:34.0488159Z #define __FLT64_DECIMAL_DIG__ 17 2025-05-07T19:46:34.0488472Z #define _GLIBCXX_SYNCHRONIZATION_HAPPENS_AFTER(A) 2025-05-07T19:46:34.0488852Z #define _G_HAVE_ST_BLKSIZE defined (_STATBUF_ST_BLKSIZE) 2025-05-07T19:46:34.0490154Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:46:34.0490939Z 2025-05-07T19:46:34.0491063Z #define __cpp_noexcept_function_type 201510L 2025-05-07T19:46:34.0491386Z #define M_PI 3.14159265358979323846 2025-05-07T19:46:34.0491704Z #define _GLIBCXX_PACKAGE_NAME "package-unused" 2025-05-07T19:46:34.0492027Z #define _GLIBCXX_HAVE_BUILTIN_IS_SAME 1 2025-05-07T19:46:34.0492346Z #define __GCC_ATOMIC_CHAR32_T_LOCK_FREE 2 2025-05-07T19:46:34.0492646Z #define _POSIX_DELAYTIMER_MAX 32 2025-05-07T19:46:34.0492933Z #define _GLIBCXX_USE_UTIME 1 2025-05-07T19:46:34.0493193Z #define _STL_ITERATOR_BASE_FUNCS_H 1 2025-05-07T19:46:34.0493844Z #define _IO_peekc_unlocked(_fp) (_IO_BE ((_fp)->_IO_read_ptr >= (_fp)->_IO_read_end, 0) && __underflow (_fp) == EOF ? EOF : *(unsigned char *) (_fp)->_IO_read_ptr) 2025-05-07T19:46:34.0494426Z #define _GLIBCXX_TR1_ELL_INTEGRAL_TCC 1 2025-05-07T19:46:34.0494761Z #define w_termsig __wait_terminated.__w_termsig 2025-05-07T19:46:34.0495097Z #define __FLOAT_WORD_ORDER __BYTE_ORDER 2025-05-07T19:46:34.0495393Z #define __cudaCDP2GetErrorName 2025-05-07T19:46:34.0495682Z #define XATTR_SIZE_MAX 65536 2025-05-07T19:46:34.0495940Z #define be64toh(x) __bswap_64 (x) 2025-05-07T19:46:34.0496251Z #define __ASSERT_VOID_CAST static_cast 2025-05-07T19:46:34.0496563Z #define __cpp_variadic_templates 200704L 2025-05-07T19:46:34.0496877Z #define RAND_MAX 2147483647 2025-05-07T19:46:34.0497135Z #define _GLIBCXX_USE_C99_COMPLEX_TR1 1 2025-05-07T19:46:34.0497547Z #define __UINT_FAST64_MAX__ 0xffffffffffffffffUL 2025-05-07T19:46:34.0497854Z #define __SM_90_RT_H__ 2025-05-07T19:46:34.0498109Z #define __SIG_ATOMIC_TYPE__ int 2025-05-07T19:46:34.0498382Z #define __COMPAR_FN_T 2025-05-07T19:46:34.0498616Z #define __GID_T_TYPE __U32_TYPE 2025-05-07T19:46:34.0498892Z #define _IO_BAD_SEEN 0x4000 2025-05-07T19:46:34.0499355Z #define _PSTL_PRAGMA_MESSAGE_IMPL(x) _PSTL_PRAGMA(message(_PSTL_STRING_CONCAT(_PSTL_PRAGMA_LOCATION, x))) 2025-05-07T19:46:34.0499870Z #define __DBL_MIN_10_EXP__ (-307) 2025-05-07T19:46:34.0500201Z #define __glibcxx_requires_sorted_pred(_First,_Last,_Pred) 2025-05-07T19:46:34.0500567Z #define __FINITE_MATH_ONLY__ 0 2025-05-07T19:46:34.0500853Z #define _PSTL_PRAGMA_SIMD_INCLUSIVE_SCAN(PRM) 2025-05-07T19:46:34.0501188Z #define cudaArrayColorAttachment 0x20 2025-05-07T19:46:34.0501503Z #define __cpp_variable_templates 201304L 2025-05-07T19:46:34.0501991Z #define cudaKernelNodeAttributeMemSyncDomainMap cudaLaunchAttributeMemSyncDomainMap 2025-05-07T19:46:34.0502527Z #define __cpp_lib_integral_constant_callable 201304 2025-05-07T19:46:34.0502844Z #define _GLIBCXX_HAVE_SINHF 1 2025-05-07T19:46:34.0503119Z #define MOD_TIMECONST ADJ_TIMECONST 2025-05-07T19:46:34.0503515Z #define __cpp_lib_result_of_sfinae 201210 2025-05-07T19:46:34.0529511Z #define __SM_30_INTRINSICS_H__ 2025-05-07T19:46:34.0529834Z #define __FLT32X_MAX_EXP__ 1024 2025-05-07T19:46:34.0530098Z #define _GLIBCXX_USE_WCHAR_T 1 2025-05-07T19:46:34.0530380Z #define _GLIBCXX_MATH_H 1 2025-05-07T19:46:34.0530650Z #define __u_char_defined 2025-05-07T19:46:34.0530952Z #define WIFEXITED(status) __WIFEXITED (__WAIT_INT (status)) 2025-05-07T19:46:34.0531320Z #define STA_PPSERROR 0x0800 2025-05-07T19:46:34.0531564Z #define _GLIBCXX_STD_A std 2025-05-07T19:46:34.0531820Z #define __FLT32_HAS_DENORM__ 1 2025-05-07T19:46:34.0532088Z #define _GLIBCXX_BEGIN_NAMESPACE_VERSION 2025-05-07T19:46:34.0532529Z #define __device_builtin_texture_type__ __location__(device_builtin_texture_type) 2025-05-07T19:46:34.0532942Z #define FP_INFINITE 1 2025-05-07T19:46:34.0533311Z #define _GLIBCXX11_DEPRECATED_SUGGEST(ALT) _GLIBCXX_DEPRECATED_SUGGEST(ALT) 2025-05-07T19:46:34.0533726Z #define _IO_pid_t __pid_t 2025-05-07T19:46:34.0533968Z #define __UINT_FAST8_MAX__ 0xff 2025-05-07T19:46:34.0534242Z #define __LEAF , __leaf__ 2025-05-07T19:46:34.0534475Z #define PATH_MAX 4096 2025-05-07T19:46:34.0534735Z #define __cpp_rvalue_reference 200610L 2025-05-07T19:46:34.0535056Z #define __LDBL_REDIR1(name,proto,alias) name proto 2025-05-07T19:46:34.0535636Z #define _LIMITS_H___ 2025-05-07T19:46:34.0535856Z #define __size_t 2025-05-07T19:46:34.0536093Z #define _GLIBCXX_HAVE_FREXPF 1 2025-05-07T19:46:34.0536620Z #define STA_RONLY (STA_PPSSIGNAL | STA_PPSJITTER | STA_PPSWANDER | STA_PPSERROR | STA_CLOCKERR | STA_NANO | STA_MODE | STA_CLK) 2025-05-07T19:46:34.0537193Z #define _GLIBCXX_HAVE_FREXPL 1 2025-05-07T19:46:34.0537499Z #define __cpp_nested_namespace_definitions 201411L 2025-05-07T19:46:34.0537810Z #define __DEC64_MAX_EXP__ 385 2025-05-07T19:46:34.0538258Z #define _WCHAR_T_DEFINED 2025-05-07T19:46:34.0538614Z #define __glibcxx_requires_can_decrement_range(_First1,_Last1,_First2) 2025-05-07T19:46:34.0539024Z #define MOD_STATUS ADJ_STATUS 2025-05-07T19:46:34.0539317Z #define _GLIBCXX_PURE __attribute__ ((__pure__)) 2025-05-07T19:46:34.0539749Z #define _GLIBCXX_HAVE_STDINT_H 1 2025-05-07T19:46:34.0540031Z #define __SIZEOF_PTHREAD_CONDATTR_T 4 2025-05-07T19:46:34.0540325Z #define __INT8_C(c) c 2025-05-07T19:46:34.0540589Z #define __cudaCDP2GetParameterBuffer 2025-05-07T19:46:34.0540888Z #define _GLIBCXX_HAVE_COSHF 1 2025-05-07T19:46:34.0541168Z #define _GLIBCXX_HAVE_COSHL 1 2025-05-07T19:46:34.0541423Z #define __SM_70_RT_HPP__ 2025-05-07T19:46:34.0541680Z #define __INT_LEAST8_WIDTH__ 8 2025-05-07T19:46:34.0541948Z #define __cpp_variadic_using 201611L 2025-05-07T19:46:34.0542280Z #define __UINT_LEAST64_MAX__ 0xffffffffffffffffUL 2025-05-07T19:46:34.0542608Z #define __INT_LEAST8_MAX__ 0x7f 2025-05-07T19:46:34.0542898Z #define __SM_61_INTRINSICS_HPP__ 2025-05-07T19:46:34.0543175Z #define _IO_FLAGS2_MMAP 1 2025-05-07T19:46:34.0543603Z #define __cpp_capture_star_this 201603L 2025-05-07T19:46:34.0544120Z #define __cudaCDP2LaunchDeviceV2_ptsz 2025-05-07T19:46:34.0544430Z #define _GLIBCXX_HAVE_ENDIAN_H 1 2025-05-07T19:46:34.0544915Z #define __always_inline __inline __attribute__ ((__always_inline__)) 2025-05-07T19:46:34.0545312Z #define NFDBITS __NFDBITS 2025-05-07T19:46:34.0545586Z #define _PSTL_PRAGMA_FORCEINLINE 2025-05-07T19:46:34.0545880Z #define _GLIBCXX_HAVE_SYS_STATVFS_H 1 2025-05-07T19:46:34.0546224Z #define __glibcxx_requires_sorted(_First,_Last) 2025-05-07T19:46:34.0546551Z #define __SHRT_MAX__ 0x7fff 2025-05-07T19:46:34.0546828Z #define _GLIBCXX_SYMVER_GNU 1 2025-05-07T19:46:34.0547134Z #define w_stopval __wait_stopped.__w_stopval 2025-05-07T19:46:34.0547442Z #define STA_UNSYNC 0x0040 2025-05-07T19:46:34.0547774Z #define __LDBL_MAX__ 1.18973149535723176502126385303097021e+4932L 2025-05-07T19:46:34.0548201Z #define _GLIBCXX_USE_C99_COMPLEX _GLIBCXX11_USE_C99_COMPLEX 2025-05-07T19:46:34.0548587Z #define __FLT64X_MAX_10_EXP__ 4932 2025-05-07T19:46:34.0548878Z #define __cpp_if_constexpr 201606L 2025-05-07T19:46:34.0549211Z #define __glibcxx_class_requires4(_a,_b,_c,_d,_e) 2025-05-07T19:46:34.0549550Z #define _GLIBCXX_HAVE_WCHAR_H 1 2025-05-07T19:46:34.0549891Z #define _GLIBCXX_USE_C99_STDIO _GLIBCXX11_USE_C99_STDIO 2025-05-07T19:46:34.0550258Z #define __daddr_t_defined 2025-05-07T19:46:34.0550521Z #define __LDBL_IS_IEC_60559__ 2 2025-05-07T19:46:34.0550821Z #define _GLIBCXX_TR1_RIEMANN_ZETA_TCC 1 2025-05-07T19:46:34.0551163Z #define _GLIBCXX_HAVE_STRUCT_DIRENT_D_TYPE 1 2025-05-07T19:46:34.0551717Z #define _PSTL_CPP11_STD_ROTATE_BROKEN ((__GLIBCXX__ && __GLIBCXX__ < 20150716) || (_MSC_VER && _MSC_VER < 1800)) 2025-05-07T19:46:34.0552231Z #define _ACRTIMP 2025-05-07T19:46:34.0552478Z #define _IO_EOF_SEEN 0x10 2025-05-07T19:46:34.0552747Z #define _GLIBCXX_TR1_POLY_LAGUERRE_TCC 1 2025-05-07T19:46:34.0553063Z #define _IOS_BIN 128 2025-05-07T19:46:34.0553425Z #define __fortify_function __extern_always_inline __attribute_artificial__ 2025-05-07T19:46:34.0553877Z #define __FLT64X_HAS_QUIET_NAN__ 1 2025-05-07T19:46:34.0554156Z #define UNDERFLOW 4 2025-05-07T19:46:34.0554395Z #define NAME_MAX 255 2025-05-07T19:46:34.0554653Z #define SCHAR_MAX __SCHAR_MAX__ 2025-05-07T19:46:34.0554932Z #define __UINT_LEAST8_MAX__ 0xff 2025-05-07T19:46:34.0555233Z #define __GCC_ATOMIC_BOOL_LOCK_FREE 2 2025-05-07T19:46:34.0555535Z #define _IO_UNIFIED_JUMPTABLES 1 2025-05-07T19:46:34.0556050Z #define __FLT128_DENORM_MIN__ 6.47517511943802511092443895822764655e-4966F128 2025-05-07T19:46:34.0556559Z #define __ptr_t void * 2025-05-07T19:46:34.0556807Z #define M_E 2.7182818284590452354 2025-05-07T19:46:34.0557071Z #define cudaSurfaceType1D 0x01 2025-05-07T19:46:34.0557345Z #define __USE_ISOCXX11 1 2025-05-07T19:46:34.0557604Z #define __UINTMAX_TYPE__ long unsigned int 2025-05-07T19:46:34.0558082Z #define cudaDeviceBlockingSync 0x04 2025-05-07T19:46:34.0558381Z #define CLOCK_MONOTONIC_COARSE 6 2025-05-07T19:46:34.0558642Z #define _GLIBCXX_OS_DEFINES 1 2025-05-07T19:46:34.0558927Z #define _GLIBCXX_NODISCARD [[__nodiscard__]] 2025-05-07T19:46:34.0559226Z #define cudaSurfaceType2D 0x02 2025-05-07T19:46:34.0559485Z #define __linux 1 2025-05-07T19:46:34.0559766Z #define __DEC32_EPSILON__ 1E-6DF 2025-05-07T19:46:34.0560041Z #define cudaDeviceMask 0xff 2025-05-07T19:46:34.0560293Z #define _GLIBCXX_END_NAMESPACE_ALGO 2025-05-07T19:46:34.0560583Z #define __CUDA_API_VER_MAJOR__ 12 2025-05-07T19:46:34.0560872Z #define htobe16(x) __bswap_16 (x) 2025-05-07T19:46:34.0561142Z #define HUGE_VALF (__builtin_huge_valf()) 2025-05-07T19:46:34.0561447Z #define __FLT_EVAL_METHOD_TS_18661_3__ 0 2025-05-07T19:46:34.0561738Z #define HUGE_VALL (__builtin_huge_vall()) 2025-05-07T19:46:34.0562032Z #define _BITS_TYPES_H 1 2025-05-07T19:46:34.0562301Z #define ULONG_LONG_MAX (LONG_LONG_MAX * 2ULL + 1ULL) 2025-05-07T19:46:34.0562645Z #define _IO_cleanup_region_end(_Doit) 2025-05-07T19:46:34.0562929Z #define cudaSurfaceType3D 0x03 2025-05-07T19:46:34.0563217Z #define _GLIBCXX_HAVE_SYS_TIME_H 1 2025-05-07T19:46:34.0563493Z #define __cudaGet_blockIdx() blockIdx 2025-05-07T19:46:34.0563791Z #define _IO_DONT_CLOSE 0100000 2025-05-07T19:46:34.0564575Z #define __MATHDECLX(type,function,suffix,args,attrib) __MATHDECL_1(type, function,suffix, args) __attribute__ (attrib); __MATHDECL_1(type, __CONCAT(__,function),suffix, args) __attribute__ (attrib) 2025-05-07T19:46:34.0565375Z #define cudaHostRegisterDefault 0x00 2025-05-07T19:46:34.0565658Z #define __unix 1 2025-05-07T19:46:34.0565861Z #define MATH_ERRNO 1 2025-05-07T19:46:34.0566104Z #define _GLIBCXX_STDIO_SEEK_END 2 2025-05-07T19:46:34.0566366Z #define _GLIBCXX_USE_FCHMODAT 1 2025-05-07T19:46:34.0566629Z #define __SM_100_RT_H__ 2025-05-07T19:46:34.0566867Z #define __UINT32_MAX__ 0xffffffffU 2025-05-07T19:46:34.0567158Z #define __GXX_EXPERIMENTAL_CXX0X__ 1 2025-05-07T19:46:34.0567446Z #define __UID_T_TYPE __U32_TYPE 2025-05-07T19:46:34.0567702Z #define _GLIBCXX20_DEPRECATED(MSG) 2025-05-07T19:46:34.0568010Z #define _GLIBCXX_HAVE_ATOMIC_LOCK_POLICY 1 2025-05-07T19:46:34.0568458Z #define __CUDART_API_VERSION ((__CUDA_API_VER_MAJOR__ * 1000) + (__CUDA_API_VER_MINOR__ * 10)) 2025-05-07T19:46:34.0568925Z #define __nv_pure__ __location__(nv_pure) 2025-05-07T19:46:34.0569217Z #define CUDARTAPI_CDECL 2025-05-07T19:46:34.0569479Z #define _PSTL_USAGE_WARNINGS 0 2025-05-07T19:46:34.0569741Z #define _GLIBCXX98_USE_C99_COMPLEX 1 2025-05-07T19:46:34.0570028Z #define __cpp_lib_void_t 201411 2025-05-07T19:46:34.0570297Z #define _POSIX_AIO_MAX 1 2025-05-07T19:46:34.0570529Z #define __SIZE_T 2025-05-07T19:46:34.0570781Z #define isgraph_l(c,l) __isgraph_l ((c), (l)) 2025-05-07T19:46:34.0571091Z #define _GLIBCXX_FULLY_DYNAMIC_STRING 0 2025-05-07T19:46:34.0571394Z #define _POSIX_PIPE_BUF 512 2025-05-07T19:46:34.0571646Z #define __CUDA_RUNTIME_API_H__ 2025-05-07T19:46:34.0571919Z #define _GLIBCXX_HAVE_STRTOLD 1 2025-05-07T19:46:34.0572171Z #define _ATFILE_SOURCE 1 2025-05-07T19:46:34.0572565Z #define __glibcxx_assert(cond) do { __glibcxx_constexpr_assert(cond); } while (false) 2025-05-07T19:46:34.0572993Z #define __WAIT_STATUS void * 2025-05-07T19:46:34.0573265Z #define __MATH_FUNCTIONS_H__ 2025-05-07T19:46:34.0573541Z #define _GLIBCXX_HAVE_WCSTOF 1 2025-05-07T19:46:34.0573805Z #define __FLT128_MIN_EXP__ (-16381) 2025-05-07T19:46:34.0574094Z #define _GLIBCXX_HAVE_LC_MESSAGES 1 2025-05-07T19:46:34.0574355Z #define __WINT_MIN__ 0U 2025-05-07T19:46:34.0574939Z #define _PSTL_CPP14_VARIABLE_TEMPLATES_PRESENT (!__INTEL_COMPILER || __INTEL_COMPILER >= 1700) && (_MSC_FULL_VER >= 190023918 || __cplusplus >= 201402L) 2025-05-07T19:46:34.0575657Z #define isdigit_l(c,l) __isdigit_l ((c), (l)) 2025-05-07T19:46:34.0575960Z #define WUNTRACED 2 2025-05-07T19:46:34.0576180Z #define _GLIBCXX_HAVE_SQRTF 1 2025-05-07T19:46:34.0576462Z #define __SIZEOF_PTHREAD_RWLOCKATTR_T 8 2025-05-07T19:46:34.0576745Z #define NZERO 20 2025-05-07T19:46:34.0576960Z #define _GLIBCXX_HAVE_MEMALIGN 1 2025-05-07T19:46:34.0577239Z #define _PSTL_PRAGMA(x) _Pragma(#x) 2025-05-07T19:46:34.0577517Z #define MOD_CLKA ADJ_OFFSET_SINGLESHOT 2025-05-07T19:46:34.0577806Z #define MOD_CLKB ADJ_TICK 2025-05-07T19:46:34.0578049Z #define __FLT128_MIN_10_EXP__ (-4931) 2025-05-07T19:46:34.0578335Z #define __FLT32X_IS_IEC_60559__ 2 2025-05-07T19:46:34.0578655Z #define __DEVICE_FUNCTIONS_H__ 2025-05-07T19:46:34.0578938Z #define SCHAR_MIN (-SCHAR_MAX - 1) 2025-05-07T19:46:34.0579208Z #define EXIT_FAILURE 1 2025-05-07T19:46:34.0579452Z #define ADJ_MAXERROR 0x0004 2025-05-07T19:46:34.0579726Z #define __INT_LEAST16_WIDTH__ 16 2025-05-07T19:46:34.0579981Z #define _SIZE_T_DEFINED_ 2025-05-07T19:46:34.0580236Z #define _POSIX_AIO_LISTIO_MAX 2 2025-05-07T19:46:34.0580501Z #define __cudaCDP2DeviceGetLimit 2025-05-07T19:46:34.0580837Z #define __LDBL_REDIR_NTH(name,proto) name proto __THROW 2025-05-07T19:46:34.0581177Z #define __cudaCDP2FuncGetAttributes 2025-05-07T19:46:34.0581468Z #define __SCHAR_MAX__ 0x7f 2025-05-07T19:46:34.0581706Z #define __FLT128_MANT_DIG__ 113 2025-05-07T19:46:34.0581977Z #define __USING_NAMESPACE_STD(name) 2025-05-07T19:46:34.0582261Z #define _GLIBCXX_HAVE_OBSOLETE_ISINF 1 2025-05-07T19:46:34.0582573Z #define __WCHAR_MIN__ (-__WCHAR_MAX__ - 1) 2025-05-07T19:46:34.0582867Z #define SEEK_DATA 3 2025-05-07T19:46:34.0583093Z #define __KERNEL_STRICT_NAMES 2025-05-07T19:46:34.0583508Z #define _IO_stderr ((_IO_FILE*)(&_IO_2_1_stderr_)) 2025-05-07T19:46:34.0584122Z #define _IO_ferror_unlocked(__fp) (((__fp)->_flags & _IO_ERR_SEEN) != 0) 2025-05-07T19:46:34.0584606Z #define _FUNCTEXCEPT_H 1 2025-05-07T19:46:34.0584861Z #define __INT64_C(c) c ## L 2025-05-07T19:46:34.0585159Z #define __NTH(fct) __LEAF_ATTR fct throw () 2025-05-07T19:46:34.0585504Z #define _GLIBCXX_CONST __attribute__ ((__const__)) 2025-05-07T19:46:34.0585855Z #define _GLIBCXX_HAVE_LINK 1 2025-05-07T19:46:34.0586153Z #define cudaNvSciSyncAttrWait 0x2 2025-05-07T19:46:34.0586454Z #define __GCC_ATOMIC_POINTER_LOCK_FREE 2 2025-05-07T19:46:34.0586778Z #define STA_PPSWANDER 0x0400 2025-05-07T19:46:34.0587037Z #define __INT_WCHAR_T_H 2025-05-07T19:46:34.0587295Z #define WSTOPPED 2 2025-05-07T19:46:34.0587540Z #define _POSIX_THREAD_THREADS_MAX 64 2025-05-07T19:46:34.0587851Z #define _POSIX_MQ_OPEN_MAX 8 2025-05-07T19:46:34.0588107Z #define FP_NORMAL 4 2025-05-07T19:46:34.0588375Z #define __cudaCDP2LaunchDevice_ptsz 2025-05-07T19:46:34.0588664Z #define _BITS_TIMEX_H 1 2025-05-07T19:46:34.0588921Z #define _POSIX_LINK_MAX 8 2025-05-07T19:46:34.0589200Z #define _GLIBCXX_HAVE_LIMIT_FSIZE 1 2025-05-07T19:46:34.0589493Z #define _GLIBCXX_HAVE_ATAN2F 1 2025-05-07T19:46:34.0589787Z #define cudaTextureType1D 0x01 2025-05-07T19:46:34.0590061Z #define _GLIBCXX_HAVE_ATAN2L 1 2025-05-07T19:46:34.0590344Z #define COLL_WEIGHTS_MAX 255 2025-05-07T19:46:34.0590618Z #define __isascii(c) (((c) & ~0x7f) == 0) 2025-05-07T19:46:34.0590937Z #define __toascii(c) ((c) & 0x7f) 2025-05-07T19:46:34.0591382Z #define __attribute_format_strfmon__(a,b) __attribute__ ((__format__ (__strfmon__, a, b))) 2025-05-07T19:46:34.0591866Z #define _IO_MAGIC 0xFBAD0000 2025-05-07T19:46:34.0592140Z #define _GLIBCXX_USE_SENDFILE 1 2025-05-07T19:46:34.0592422Z #define _POSIX_SOURCE 1 2025-05-07T19:46:34.0592692Z #define cudaTextureType2D 0x02 2025-05-07T19:46:34.0592962Z #define _PTR_TRAITS_H 1 2025-05-07T19:46:34.0593256Z #define _GLIBCXX_NOEXCEPT_QUAL noexcept (_NE) 2025-05-07T19:46:34.0593580Z #define _GLIBCXX_HAVE_POWF 1 2025-05-07T19:46:34.0593866Z #define _POSIX2_BC_STRING_MAX 1000 2025-05-07T19:46:34.0594194Z #define __attribute_used__ __attribute__ ((__used__)) 2025-05-07T19:46:34.0598867Z #define cudaTextureType3D 0x03 2025-05-07T19:46:34.0599136Z #define _STDIO_USES_IOSTREAM 2025-05-07T19:46:34.0599413Z #define CLOCK_REALTIME 0 2025-05-07T19:46:34.0599654Z #define __FLT32X_MANT_DIG__ 53 2025-05-07T19:46:34.0599936Z #define __GCC_ATOMIC_CHAR16_T_LOCK_FREE 2 2025-05-07T19:46:34.0600245Z #define __cpp_aligned_new 201606L 2025-05-07T19:46:34.0600511Z #define __USER_LABEL_PREFIX__ 2025-05-07T19:46:34.0600787Z #define cudaEventBlockingSync 0x01 2025-05-07T19:46:34.0601059Z #define _GLIBCXX_HAVE_TANL 1 2025-05-07T19:46:34.0601336Z #define _GLIBCXX_USE_PTHREAD_RWLOCK_T 1 2025-05-07T19:46:34.0601628Z #define _GLIBCXX_HAVE_LINUX_RANDOM_H 1 2025-05-07T19:46:34.0601923Z #define _GLIBCXX_USE_C99_FENV_TR1 1 2025-05-07T19:46:34.0602256Z #define __FLT32_MAX_10_EXP__ 38 2025-05-07T19:46:34.0602521Z #define __GLIBC__ 2 2025-05-07T19:46:34.0602727Z #define __END_DECLS } 2025-05-07T19:46:34.0602970Z #define FP_ILOGB0 (-2147483647 - 1) 2025-05-07T19:46:34.0603335Z #define __FLT64X_EPSILON__ 1.08420217248550443400745280086994171e-19F64x 2025-05-07T19:46:34.0603711Z #define __CONCAT(x,y) x ## y 2025-05-07T19:46:34.0603967Z #define WCONTINUED 8 2025-05-07T19:46:34.0604188Z #define __STDC_HOSTED__ 1 2025-05-07T19:46:34.0604448Z #define _GLIBCXX_HAVE_ARPA_INET_H 1 2025-05-07T19:46:34.0604713Z #define _ALLOCA_H 1 2025-05-07T19:46:34.0604957Z #define __host__ __location__(host) 2025-05-07T19:46:34.0605363Z #define __warndecl(name,msg) extern void name (void) __attribute__((__warning__ (msg))) 2025-05-07T19:46:34.0605804Z #define __SLONG32_TYPE int 2025-05-07T19:46:34.0606067Z #define _GLIBCXX_DEBUG_ASSERTIONS_H 1 2025-05-07T19:46:34.0606339Z #define _SYS_SELECT_H 1 2025-05-07T19:46:34.0606582Z #define _IO_LINE_BUF 0x200 2025-05-07T19:46:34.0606821Z #define _IOS_NOCREATE 32 2025-05-07T19:46:34.0607074Z #define __DEC64_MIN_EXP__ (-382) 2025-05-07T19:46:34.0607338Z #define __cudaGet_warpSize() warpSize 2025-05-07T19:46:34.0607633Z #define __SSIZE_T_TYPE __SWORD_TYPE 2025-05-07T19:46:34.0607910Z #define _GLIBCXX_HAVE_LIMIT_VMEM 0 2025-05-07T19:46:34.0608198Z #define __global__ __location__(global) 2025-05-07T19:46:34.0608474Z #define __GNU_LIBRARY__ 6 2025-05-07T19:46:34.0608935Z #define __cpp_decltype_auto 201304L 2025-05-07T19:46:34.0609421Z #define __DBL_DIG__ 15 2025-05-07T19:46:34.0609661Z #define TIME_UTC 1 2025-05-07T19:46:34.0610037Z #define __FLT32_DIG__ 6 2025-05-07T19:46:34.0610374Z #define __forceinline__ __inline__ __attribute__((always_inline)) 2025-05-07T19:46:34.0610801Z #define cudaHostAllocWriteCombined 0x04 2025-05-07T19:46:34.0611127Z #define cudaDeviceScheduleAuto 0x00 2025-05-07T19:46:34.0611457Z #define iscntrl_l(c,l) __iscntrl_l ((c), (l)) 2025-05-07T19:46:34.0611769Z #define _G_BUFSIZ 8192 2025-05-07T19:46:34.0612099Z #define __FLT_EPSILON__ 1.19209289550781250000000000000000000e-7F 2025-05-07T19:46:34.0612485Z #define cudaTextureTypeCubemap 0x0C 2025-05-07T19:46:34.0612804Z #define __cudaCDP2GetDevice 2025-05-07T19:46:34.0613104Z #define __cudaCDP2PeekAtLastError 2025-05-07T19:46:34.0613399Z #define STA_CLOCKERR 0x1000 2025-05-07T19:46:34.0613676Z #define __GXX_WEAK__ 1 2025-05-07T19:46:34.0613939Z #define __RLIM_T_TYPE __SYSCALL_ULONG_TYPE 2025-05-07T19:46:34.0614271Z #define _GLIBCXX_HAVE_ISNANF 1 2025-05-07T19:46:34.0614541Z #define __SHRT_WIDTH__ 16 2025-05-07T19:46:34.0614866Z #define __cpp_lib_robust_nonmodifying_seq_ops 201304 2025-05-07T19:46:34.0615221Z #define _GLIBCXX_BITS_SPECFUN_H 1 2025-05-07T19:46:34.0615523Z #define _GLIBCXX_HAVE_ISNANL 1 2025-05-07T19:46:34.0615842Z #define isblank_l(c,l) __isblank_l ((c), (l)) 2025-05-07T19:46:34.0616149Z #define _G_config_h 1 2025-05-07T19:46:34.0616448Z #define M_LOG2El 1.442695040888963407359924681001892137L 2025-05-07T19:46:34.0616794Z #define ADJ_OFFSET_SINGLESHOT 0x8001 2025-05-07T19:46:34.0617097Z #define _GCC_WCHAR_T 2025-05-07T19:46:34.0617329Z #define TMP_MAX 238328 2025-05-07T19:46:34.0617583Z #define __FLT32_IS_IEC_60559__ 2 2025-05-07T19:46:34.0617854Z #define __DEVICE_TYPES_H__ 2025-05-07T19:46:34.0618315Z #define __DEV_T_TYPE __UQUAD_TYPE 2025-05-07T19:46:34.0618596Z #define _EXT_NUMERIC_TRAITS 1 2025-05-07T19:46:34.0618889Z #define _GLIBCXX_BEGIN_NAMESPACE_ALGO 2025-05-07T19:46:34.0619199Z #define _IO_SKIPWS 01 2025-05-07T19:46:34.0619609Z #define cudaStreamGraphFireAndForgetAsSibling (cudaStream_t)0x0300000000000000 2025-05-07T19:46:34.0620102Z #define _IO_SCIENTIFIC 04000 2025-05-07T19:46:34.0620376Z #define _GLIBCXX_HAVE_STRING_H 1 2025-05-07T19:46:34.0620730Z #define __LDBL_MIN__ 3.36210314311209350626267781732175260e-4932L 2025-05-07T19:46:34.0621109Z #define cudaDeviceScheduleSpin 0x01 2025-05-07T19:46:34.0621621Z #define __nonnull(params) __attribute__ ((__nonnull__ params)) 2025-05-07T19:46:34.0621994Z #define __DBL_IS_IEC_60559__ 2 2025-05-07T19:46:34.0622345Z #define le32toh(x) (x) 2025-05-07T19:46:34.0622582Z #define _SIZE_T_DEFINED 2025-05-07T19:46:34.0622853Z #define _GLIBCXX_HAVE_XLOCALE_H 1 2025-05-07T19:46:34.0623206Z #define cudaArraySparsePropertiesSingleMipTail 0x1 2025-05-07T19:46:34.0623663Z #define __DEC32_MAX__ 9.999999E96DF 2025-05-07T19:46:34.0624282Z #define __WIFSIGNALED(status) (((signed char) (((status) & 0x7f) + 1) >> 1) > 0) 2025-05-07T19:46:34.0624761Z #define _GLIBCXX_HAVE_FMODL 1 2025-05-07T19:46:34.0625057Z #define _GLIBCXX_HAVE_POLL 1 2025-05-07T19:46:34.0625329Z #define __SM_32_INTRINSICS_H__ 2025-05-07T19:46:34.0625620Z #define _POSIX_NAME_MAX 14 2025-05-07T19:46:34.0625906Z #define __cpp_threadsafe_static_init 200806L 2025-05-07T19:46:34.0626468Z #define _GLIBCXX_MAKE_MOVE_IF_NOEXCEPT_ITERATOR(_Iter) std::__make_move_if_noexcept_iterator(_Iter) 2025-05-07T19:46:34.0627017Z #define _GLIBCXX_USE_CLOCK_REALTIME 1 2025-05-07T19:46:34.0627338Z #define __cpp_enumerator_attributes 201411L 2025-05-07T19:46:34.0627712Z #define __WCOREDUMP(status) ((status) & __WCOREFLAG) 2025-05-07T19:46:34.0628041Z #define _WCHAR_T_ 2025-05-07T19:46:34.0628289Z #define _GLIBCXX_FAST_MATH 0 2025-05-07T19:46:34.0628666Z #define __FLT64X_DENORM_MIN__ 3.64519953188247460252840593361941982e-4951F64x 2025-05-07T19:46:34.0629090Z #define RTSIG_MAX 32 2025-05-07T19:46:34.0629320Z #define _STDDEF_H 2025-05-07T19:46:34.0629566Z #define CU_UUID_HAS_BEEN_DEFINED 2025-05-07T19:46:34.0629862Z #define _VA_LIST_DEFINED 2025-05-07T19:46:34.0630119Z #define __FLT32X_HAS_INFINITY__ 1 2025-05-07T19:46:34.0630477Z #define __glibcxx_requires_non_empty_range(_First,_Last) 2025-05-07T19:46:34.0630881Z #define __grid_constant__ __location__(grid_constant) 2025-05-07T19:46:34.0631239Z #define __INT32_MAX__ 0x7fffffff 2025-05-07T19:46:34.0631541Z #define _GLIBCXX_BEGIN_EXTERN_C extern "C" { 2025-05-07T19:46:34.0632038Z #define _PSTL_CPP14_INTEGER_SEQUENCE_PRESENT (_MSC_VER >= 1900 || __cplusplus >= 201402L) 2025-05-07T19:46:34.0632588Z #define __glibcxx_digits_b(T,B) (B - __glibcxx_signed_b (T,B)) 2025-05-07T19:46:34.0632989Z #define __SIZEOF_PTHREAD_COND_T 48 2025-05-07T19:46:34.0633331Z #define _PSTL_PRAGMA_SIMD_ORDERED_MONOTONIC(PRM) 2025-05-07T19:46:34.0633654Z #define __unix__ 1 2025-05-07T19:46:34.0633916Z #define __SM_60_ATOMIC_FUNCTIONS_H__ 2025-05-07T19:46:34.0634203Z #define __INT_WIDTH__ 32 2025-05-07T19:46:34.0634465Z #define __SIZEOF_LONG__ 8 2025-05-07T19:46:34.0634702Z #define _IONBF 2 2025-05-07T19:46:34.0635176Z #define __MATHCALLX(function,suffix,args,attrib) __MATHDECLX (_Mdouble_,function,suffix, args, attrib) 2025-05-07T19:46:34.0636100Z #define _IO_getc_unlocked(_fp) (_IO_BE ((_fp)->_IO_read_ptr >= (_fp)->_IO_read_end, 0) ? __uflow (_fp) : *(unsigned char *) (_fp)->_IO_read_ptr++) 2025-05-07T19:46:34.0636647Z #define __STDC_IEC_559__ 1 2025-05-07T19:46:34.0636901Z #define __STDC_ISO_10646__ 201103L 2025-05-07T19:46:34.0637157Z #define __UINT16_C(c) c 2025-05-07T19:46:34.0637402Z #define M_2_PI 0.63661977236758134308 2025-05-07T19:46:34.0637663Z #define STA_DEL 0x0020 2025-05-07T19:46:34.0637912Z #define __CUDACC_VER_MINOR__ 8 2025-05-07T19:46:34.0638158Z #define __id_t_defined 2025-05-07T19:46:34.0638428Z #define w_retcode __wait_terminated.__w_retcode 2025-05-07T19:46:34.0638867Z #define _IO_PENDING_OUTPUT_COUNT(_fp) ((_fp)->_IO_write_ptr - (_fp)->_IO_write_base) 2025-05-07T19:46:34.0639401Z #define _GLIBCXX_HAVE_MODFF 1 2025-05-07T19:46:34.0639675Z #define _GLIBCXX_HAVE_MODFL 1 2025-05-07T19:46:34.0639930Z #define __DECIMAL_DIG__ 21 2025-05-07T19:46:34.0640189Z #define _POSIX2_RE_DUP_MAX 255 2025-05-07T19:46:34.0640439Z #define __USE_FORTIFY_LEVEL 0 2025-05-07T19:46:34.0640709Z #define __STDC_IEC_559_COMPLEX__ 1 2025-05-07T19:46:34.0640962Z #define SING 2 2025-05-07T19:46:34.0641186Z #define STA_FREQHOLD 0x0080 2025-05-07T19:46:34.0641441Z #define __SM_32_ATOMIC_FUNCTIONS_HPP__ 2025-05-07T19:46:34.0641744Z #define cudaStreamDefault 0x00 2025-05-07T19:46:34.0642078Z #define __FLT64_EPSILON__ 2.22044604925031308084726333618164062e-16F64 2025-05-07T19:46:34.0642512Z #define _GLIBCXX_HAVE_HYPOTL 1 2025-05-07T19:46:34.0642790Z #define _GLIBCXX_HAVE_SYS_UIO_H 1 2025-05-07T19:46:34.0643045Z #define __gnu_linux__ 1 2025-05-07T19:46:34.0643285Z #define __INT16_MAX__ 0x7fff 2025-05-07T19:46:34.0643530Z #define _LARGEFILE_SOURCE 1 2025-05-07T19:46:34.0643792Z #define MAX_INPUT 255 2025-05-07T19:46:34.0644022Z #define __FLT64_MIN_EXP__ (-1021) 2025-05-07T19:46:34.0644352Z #define __isalpha_l(c,l) __isctype_l((c), _ISalpha, (l)) 2025-05-07T19:46:34.0644707Z #define __glibcxx_requires_heap(_First,_Last) 2025-05-07T19:46:34.0645029Z #define _GLIBCXX_CPU_DEFINES 1 2025-05-07T19:46:34.0645282Z #define _GLIBCXX_HAVE_POLL_H 1 2025-05-07T19:46:34.0645691Z #define __attribute_warn_unused_result__ __attribute__ ((__warn_unused_result__)) 2025-05-07T19:46:34.0646103Z #define _IO_SHOWPOS 02000 2025-05-07T19:46:34.0646410Z #define _GLIBCXX_HAVE_SYMVER_SYMBOL_RENAMING_RUNTIME_SUPPORT 1 2025-05-07T19:46:34.0646761Z #define _Mfloat_ float 2025-05-07T19:46:34.0647011Z #define __glibcxx_requires_cond(_Cond,_Msg) 2025-05-07T19:46:34.0647304Z #define __FLT64X_MIN_10_EXP__ (-4931) 2025-05-07T19:46:34.0647564Z #define DELAYTIMER_MAX 2147483647 2025-05-07T19:46:34.0647871Z #define cudaMemPoolCreateUsageHwDecompress 0x2 2025-05-07T19:46:34.0648415Z #define __glibcxx_max_b(T,B) (__glibcxx_signed_b (T,B) ? (((((T)1 << (__glibcxx_digits_b (T,B) - 1)) - 1) << 1) + 1) : ~(T)0) 2025-05-07T19:46:34.0648891Z #define __LDBL_HAS_QUIET_NAN__ 1 2025-05-07T19:46:34.0649163Z #define _GLIBCXX98_USE_C99_STDIO 1 2025-05-07T19:46:34.0649466Z #define cudaKernelNodeAttrID cudaLaunchAttributeID 2025-05-07T19:46:34.0649821Z #define __glibcxx_class_requires2(_a,_b,_c) 2025-05-07T19:46:34.0650103Z #define __USE_ISOC11 1 2025-05-07T19:46:34.0650338Z #define _BSD_SIZE_T_ 2025-05-07T19:46:34.0650571Z #define ADJ_MICRO 0x1000 2025-05-07T19:46:34.0650829Z #define _GLIBCXX_HAVE_FABSF 1 2025-05-07T19:46:34.0651104Z #define _GLIBCXX_HAVE_FABSL 1 2025-05-07T19:46:34.0651384Z #define _PSTL_PRAGMA_SIMD _PSTL_PRAGMA(omp simd) 2025-05-07T19:46:34.0651705Z #define __FLT64_MANT_DIG__ 53 2025-05-07T19:46:34.0651983Z #define __attribute_const__ __attribute__ ((__const__)) 2025-05-07T19:46:34.0652317Z #define __THROW throw () 2025-05-07T19:46:34.0652552Z #define __cudaGet_gridDim() gridDim 2025-05-07T19:46:34.0652839Z #define __SM_60_ATOMIC_FUNCTIONS_HPP__ 2025-05-07T19:46:34.0653169Z #define __glibcxx_requires_heap_pred(_First,_Last,_Pred) 2025-05-07T19:46:34.0653506Z #define htobe32(x) __bswap_32 (x) 2025-05-07T19:46:34.0653782Z #define _GLIBCXX_HAVE_POWL 1 2025-05-07T19:46:34.0654013Z #define __FLT64X_MANT_DIG__ 64 2025-05-07T19:46:34.0654278Z #define __GLIBC_HAVE_LONG_LONG 1 2025-05-07T19:46:34.0654526Z #define L_tmpnam 20 2025-05-07T19:46:34.0654753Z #define ___int_wchar_t_h 2025-05-07T19:46:34.0655076Z #define WIFCONTINUED(status) __WIFCONTINUED (__WAIT_INT (status)) 2025-05-07T19:46:34.0655473Z #define isascii(c) __isascii (c) 2025-05-07T19:46:34.0655711Z #define _T_PTRDIFF 2025-05-07T19:46:34.0656020Z #define _GLIBCXX_MOVE3(_Tp,_Up,_Vp) std::move(_Tp, _Up, _Vp) 2025-05-07T19:46:34.0656362Z #define toascii(c) __toascii (c) 2025-05-07T19:46:34.0656626Z #define __GNUC__ 11 2025-05-07T19:46:34.0656880Z #define __SYSCALL_ULONG_TYPE __ULONGWORD_TYPE 2025-05-07T19:46:34.0657240Z #define __GXX_RTTI 1 2025-05-07T19:46:34.0657463Z #define __pie__ 2 2025-05-07T19:46:34.0657652Z #define __MMX__ 1 2025-05-07T19:46:34.0657865Z #define __cudaCDP2Malloc 2025-05-07T19:46:34.0658096Z #define __timespec_defined 1 2025-05-07T19:46:34.0658329Z #define L_ctermid 9 2025-05-07T19:46:34.0658535Z #define __OFF64_T_TYPE __SQUAD_TYPE 2025-05-07T19:46:34.0658830Z #define __cudaCDP2GetParameterBufferV2 2025-05-07T19:46:34.0659196Z #define offsetof(TYPE,MEMBER) __builtin_offsetof (TYPE, MEMBER) 2025-05-07T19:46:34.0659568Z #define _BITS_POSIX2_LIM_H 1 2025-05-07T19:46:34.0659824Z #define _GLIBCXX98_USE_C99_STDLIB 1 2025-05-07T19:46:34.0660094Z #define cudaMemAttachGlobal 0x01 2025-05-07T19:46:34.0660393Z #define FD_SET(fd,fdsetp) __FD_SET (fd, fdsetp) 2025-05-07T19:46:34.0660768Z #define __FLT_HAS_DENORM__ 1 2025-05-07T19:46:34.0661034Z #define __SIZEOF_LONG_DOUBLE__ 16 2025-05-07T19:46:34.0661458Z #define _GLIBCXX_NATIVE_THREAD_ID (__gthread_active_p() ? __gthread_self() : (__gthread_t)1) 2025-05-07T19:46:34.0662358Z #define assert_perror(errnum) (!(errnum) ? __ASSERT_VOID_CAST (0) : __assert_perror_fail ((errnum), __FILE__, __LINE__, __ASSERT_FUNCTION)) 2025-05-07T19:46:34.0662959Z #define _IO_HAVE_ST_BLKSIZE _G_HAVE_ST_BLKSIZE 2025-05-07T19:46:34.0663243Z #define __USE_SVID 1 2025-05-07T19:46:34.0663579Z #define __constant__ __location__(constant) 2025-05-07T19:46:34.0664074Z #define _GLIBCXX_HAVE_POSIX_MEMALIGN 1 2025-05-07T19:46:34.0664404Z #define __device__ __location__(device) 2025-05-07T19:46:34.0664787Z #define _GLIBCXX_HAVE_EXCEPTION_PTR_SINCE_GCC46 1 2025-05-07T19:46:34.0665145Z #define _GLIBCXX_RES_LIMITS 1 2025-05-07T19:46:34.0665408Z #define M_1_PI 0.31830988618379067154 2025-05-07T19:46:34.0665718Z #define CUDART_DEVICE __device__ 2025-05-07T19:46:34.0666078Z #define __LDBL_REDIR1_NTH(name,proto,alias) name proto __THROW 2025-05-07T19:46:34.0666470Z #define M_PI_2 1.57079632679489661923 2025-05-07T19:46:34.0666776Z #define __BIGGEST_ALIGNMENT__ 16 2025-05-07T19:46:34.0667139Z #define cudaExternalSemaphoreWaitSkipNvSciBufMemSync 0x02 2025-05-07T19:46:34.0667543Z #define __STDC_UTF_16__ 1 2025-05-07T19:46:34.0667784Z #define LONG_MAX __LONG_MAX__ 2025-05-07T19:46:34.0668174Z #define __glibcxx_digits10_b(T,B) (__glibcxx_digits_b (T,B) * 643L / 2136) 2025-05-07T19:46:34.0668607Z #define _POSIX_THREAD_DESTRUCTOR_ITERATIONS 4 2025-05-07T19:46:34.0668943Z #define _POSIX_HOST_NAME_MAX 255 2025-05-07T19:46:34.0669206Z #define __FLT64_MAX_10_EXP__ 308 2025-05-07T19:46:34.0669485Z #define NGROUPS_MAX 65536 2025-05-07T19:46:34.0669760Z #define _GLIBCXX_NAMESPACE_LDBL 2025-05-07T19:46:34.0670029Z #define __USE_ISOC95 1 2025-05-07T19:46:34.0670272Z #define _TIME_H 1 2025-05-07T19:46:34.0670541Z #define M_LOG10El 0.434294481903251827651128918916605082L 2025-05-07T19:46:34.0670900Z #define __USE_ISOC99 1 2025-05-07T19:46:34.0671221Z #define __ASMNAME(cname) __ASMNAME2 (__USER_LABEL_PREFIX__, cname) 2025-05-07T19:46:34.0671617Z #define HOST_NAME_MAX 64 2025-05-07T19:46:34.0671863Z #define _POSIX_SEM_NSEMS_MAX 256 2025-05-07T19:46:34.0672150Z #define _IOS_ATEND 4 2025-05-07T19:46:34.0672383Z #define __SM_35_INTRINSICS_H__ 2025-05-07T19:46:34.0672725Z #define WTERMSIG(status) __WTERMSIG (__WAIT_INT (status)) 2025-05-07T19:46:34.0673149Z #define cudaStreamAttrValue cudaLaunchAttributeValue 2025-05-07T19:46:34.0673507Z #define _GLIBCXX_HAVE_S_ISREG 1 2025-05-07T19:46:34.0673810Z #define cudaSurfaceTypeCubemap 0x0C 2025-05-07T19:46:34.0674132Z #define __cpp_delegating_constructors 200604L 2025-05-07T19:46:34.0674479Z #define __FLT32_HAS_INFINITY__ 1 2025-05-07T19:46:34.0674733Z #define _STDIO_H 1 2025-05-07T19:46:34.0675156Z #define __isctype_l(c,type,locale) ((locale)->__ctype_b[(int) (c)] & (unsigned short int) type) 2025-05-07T19:46:34.0675638Z #define _GLIBCXX_PREDEFINED_OPS_H 1 2025-05-07T19:46:34.0676014Z #define __DBL_MAX__ double(1.79769313486231570814527423731704357e+308L) 2025-05-07T19:46:34.0676502Z #define _G_IO_IO_FILE_VERSION 0x20001 2025-05-07T19:46:34.0676764Z #define _POSIX_SIGQUEUE_MAX 32 2025-05-07T19:46:34.0677104Z #define _GLIBCXX_HAVE_GETS 1 2025-05-07T19:46:34.0677349Z #define _GLIBCXX_HAVE_LINUX_TYPES_H 1 2025-05-07T19:46:34.0677617Z #define __cpp_raw_strings 200710L 2025-05-07T19:46:34.0677886Z #define __INT_FAST32_MAX__ 0x7fffffffffffffffL 2025-05-07T19:46:34.0678188Z #define _GLIBCXX_HAVE_VFWSCANF 1 2025-05-07T19:46:34.0678432Z #define __DBL_HAS_INFINITY__ 1 2025-05-07T19:46:34.0678696Z #define __STDCPP_MATH_SPEC_FUNCS__ 201003L 2025-05-07T19:46:34.0678986Z #define _GLIBCXX_STDIO_EOF -1 2025-05-07T19:46:34.0679227Z #define __SIZEOF_PTHREAD_MUTEX_T 40 2025-05-07T19:46:34.0679499Z #define __CHANNEL_DESCRIPTOR_H__ 2025-05-07T19:46:34.0679823Z #define _ISbit(bit) ((bit) < 8 ? ((1 << (bit)) << 8) : ((1 << (bit)) >> 8)) 2025-05-07T19:46:34.0680181Z #define __SIZEOF_FLOAT__ 4 2025-05-07T19:46:34.0680467Z #define __USE_XOPEN 1 2025-05-07T19:46:34.0680702Z #define __SIZEOF_PTHREAD_RWLOCK_T 56 2025-05-07T19:46:34.0681119Z #define cudaStreamAttributeMemSyncDomain cudaLaunchAttributeMemSyncDomain 2025-05-07T19:46:34.0681545Z #define __USE_XOPEN2K 1 2025-05-07T19:46:34.0681766Z #define _PSTL_UDR_PRESENT 1 2025-05-07T19:46:34.0682021Z #define __HAVE_SPECULATION_SAFE_VALUE 1 2025-05-07T19:46:34.0682300Z #define _GLIBCXX_HAVE_COSF 1 2025-05-07T19:46:34.0682545Z #define __cpp_fold_expressions 201603L 2025-05-07T19:46:34.0683041Z #define cudaWaitExternalSemaphoresAsync __CUDART_API_PTSZ(cudaWaitExternalSemaphoresAsync_v2) 2025-05-07T19:46:34.0683538Z #define NL_LANGMAX _POSIX2_LINE_MAX 2025-05-07T19:46:34.0683812Z #define __DEC32_MIN_EXP__ (-94) 2025-05-07T19:46:34.0684145Z #define __glibcxx_requires_partitioned_upper(_First,_Last,_Value) 2025-05-07T19:46:34.0684520Z #define __DADDR_T_TYPE __S32_TYPE 2025-05-07T19:46:34.0684872Z #define cudaExternalSemaphoreSignalSkipNvSciBufMemSync 0x01 2025-05-07T19:46:34.0685247Z #define __END_NAMESPACE_C99 2025-05-07T19:46:34.0685509Z #define __glibcxx_integral_traps true 2025-05-07T19:46:34.0685768Z #define _POSIX_PATH_MAX 256 2025-05-07T19:46:34.0686013Z #define __INTPTR_WIDTH__ 64 2025-05-07T19:46:34.0686246Z #define __FLT64X_HAS_INFINITY__ 1 2025-05-07T19:46:34.0686498Z #define _IOS_TRUNC 16 2025-05-07T19:46:34.0686704Z #define _ISOC11_SOURCE 1 2025-05-07T19:46:34.0686937Z #define _GLIBCXX_HAVE_LINUX_FUTEX 1 2025-05-07T19:46:34.0687205Z #define __UINT_LEAST32_MAX__ 0xffffffffU 2025-05-07T19:46:34.0687491Z #define _GLIBCXX_HAVE_QUICK_EXIT 1 2025-05-07T19:46:34.0687829Z #define __glibcxx_requires_irreflexive_pred2(_First,_Last,_Pred) 2025-05-07T19:46:34.0688195Z #define LONG_MIN (-LONG_MAX - 1L) 2025-05-07T19:46:34.0688458Z #define _GLIBCXX_HAVE_SINCOSF 1 2025-05-07T19:46:34.0688699Z #define _IO_UNITBUF 020000 2025-05-07T19:46:34.0688945Z #define _GLIBCXX_HAVE_SINCOSL 1 2025-05-07T19:46:34.0689194Z #define __FD_SETSIZE 1024 2025-05-07T19:46:34.0689452Z #define getc(_fp) _IO_getc (_fp) 2025-05-07T19:46:34.0689713Z #define be32toh(x) __bswap_32 (x) 2025-05-07T19:46:34.0690059Z #define _GLIBCXX_PACKAGE__GLIBCXX_VERSION "version-unused" 2025-05-07T19:46:34.0690406Z #define __FLT32X_HAS_DENORM__ 1 2025-05-07T19:46:34.0690678Z #define __INT_FAST16_TYPE__ long int 2025-05-07T19:46:34.0690989Z #define isxdigit_l(c,l) __isxdigit_l ((c), (l)) 2025-05-07T19:46:34.0691301Z #define _GLIBCXX_HAVE_GETIPINFO 1 2025-05-07T19:46:34.0691575Z #define __MMX_WITH_SSE__ 1 2025-05-07T19:46:34.0691863Z #define __isalnum_l(c,l) __isctype_l((c), _ISalnum, (l)) 2025-05-07T19:46:34.0692200Z #define _WCHAR_T_DEFINED_ 2025-05-07T19:46:34.0692467Z #define cudaIpcMemLazyEnablePeerAccess 0x01 2025-05-07T19:46:34.0692796Z #define _GLIBCXX_HAVE_AT_QUICK_EXIT 1 2025-05-07T19:46:34.0693074Z #define __INO_T_MATCHES_INO64_T 1 2025-05-07T19:46:34.0693348Z #define __USE_POSIX199506 1 2025-05-07T19:46:34.0693588Z #define _FEATURES_H 1 2025-05-07T19:46:34.0693829Z #define __LDBL_HAS_DENORM__ 1 2025-05-07T19:46:34.0694227Z #define _PSTL_PRAGMA_SIMD_REDUCTION(PRM) _PSTL_PRAGMA(omp simd reduction(PRM)) 2025-05-07T19:46:34.0694692Z #define __WEXITSTATUS(status) (((status) & 0xff00) >> 8) 2025-05-07T19:46:34.0695031Z #define __stub_getmsg 2025-05-07T19:46:34.0695326Z #define _IO_FIXED 010000 2025-05-07T19:46:34.0695603Z #define __cpp_lib_addressof_constexpr 201603 2025-05-07T19:46:34.0695909Z #define _GLIBCXX11_USE_C99_STDIO 1 2025-05-07T19:46:34.0696189Z #define __stub_setlogin 2025-05-07T19:46:34.0696421Z #define __stub_fattach 2025-05-07T19:46:34.0696666Z #define __cplusplus 201703L 2025-05-07T19:46:34.0696918Z #define __cpp_ref_qualifiers 200710L 2025-05-07T19:46:34.0697213Z #define _STRUCT_TIMEVAL 1 2025-05-07T19:46:34.0697493Z #define INFINITY (__builtin_inff()) 2025-05-07T19:46:34.0697771Z #define _IO_UNBUFFERED 2 2025-05-07T19:46:34.0698269Z #define cudaStreamAttributeSynchronizationPolicy cudaLaunchAttributeSynchronizationPolicy 2025-05-07T19:46:34.0698789Z #define _IO_INTERNAL 010 2025-05-07T19:46:34.0699115Z #define __DEC32_MIN__ 1E-95DF 2025-05-07T19:46:34.0699449Z #define cudaKernelNodeAttrValue cudaLaunchAttributeValue 2025-05-07T19:46:34.0699832Z #define __dev_t_defined 2025-05-07T19:46:34.0700077Z #define __DEPRECATED 1 2025-05-07T19:46:34.0700338Z #define __S32_TYPE int 2025-05-07T19:46:34.0700608Z #define __cpp_rvalue_references 200610L 2025-05-07T19:46:34.0700903Z #define __DBL_MAX_EXP__ 1024 2025-05-07T19:46:34.0701189Z #define _IO_fpos_t _G_fpos_t 2025-05-07T19:46:34.0701447Z #define __WCHAR_WIDTH__ 32 2025-05-07T19:46:34.0702048Z #define cudaKernelNodeAttributePreferredSharedMemoryCarveout cudaLaunchAttributePreferredSharedMemoryCarveout 2025-05-07T19:46:34.0702668Z #define _G_HAVE_MREMAP 1 2025-05-07T19:46:34.0703002Z #define __FLT32_MAX__ 3.40282346638528859811704183484516925e+38F32 2025-05-07T19:46:34.0703473Z #define OVERFLOW 3 2025-05-07T19:46:34.0703930Z #define __toascii_l(c,l) ((l), __toascii (c)) 2025-05-07T19:46:34.0704299Z #define __DEC128_EPSILON__ 1E-33DL 2025-05-07T19:46:34.0704610Z #define __SM_32_ATOMIC_FUNCTIONS_H__ 2025-05-07T19:46:34.0704999Z #define _GLIBCXX_DEFAULT_ABI_TAG _GLIBCXX_ABI_TAG_CXX11 2025-05-07T19:46:34.0705361Z #define __SSE2_MATH__ 1 2025-05-07T19:46:34.0705663Z #define __ATOMIC_HLE_RELEASE 131072 2025-05-07T19:46:34.0706006Z #define __FSFILCNT_T_TYPE __SYSCALL_ULONG_TYPE 2025-05-07T19:46:34.0706366Z #define _IO_STDIO_H 2025-05-07T19:46:34.0706642Z #define PDP_ENDIAN __PDP_ENDIAN 2025-05-07T19:46:34.0706994Z #define isspace_l(c,l) __isspace_l ((c), (l)) 2025-05-07T19:46:34.0707339Z #define __cudaCDP2Memcpy2DAsync 2025-05-07T19:46:34.0707685Z #define __PTRDIFF_MAX__ 0x7fffffffffffffffL 2025-05-07T19:46:34.0708044Z #define _GLIBCXX_HAVE_STRERROR_R 1 2025-05-07T19:46:34.0708332Z #define __amd64 1 2025-05-07T19:46:34.0708597Z #define _POSIX_TZNAME_MAX 6 2025-05-07T19:46:34.0709115Z #define __cudaCDP2Memset3DAsync 2025-05-07T19:46:34.0709445Z #define __SYSCALL_WORDSIZE 64 2025-05-07T19:46:34.0709754Z #define _GLIBCXX_HAVE_ATTRIBUTE_VISIBILITY 1 2025-05-07T19:46:34.0710096Z #define _EXT_TYPE_TRAITS 1 2025-05-07T19:46:34.0710374Z #define _GLIBCXX_HAVE_POSIX_SEMAPHORE 1 2025-05-07T19:46:34.0710697Z #define _POSIX_RE_DUP_MAX 255 2025-05-07T19:46:34.0710967Z #define __STDC_NO_THREADS__ 1 2025-05-07T19:46:34.0711244Z #define __bounded 2025-05-07T19:46:34.0711491Z #define _GLIBCXX_HAVE_ACOSL 1 2025-05-07T19:46:34.0711768Z #define __USECONDS_T_TYPE __U32_TYPE 2025-05-07T19:46:34.0712081Z #define _IO_DELETE_DONT_CLOSE 0x40 2025-05-07T19:46:34.0712371Z #define __BEGIN_NAMESPACE_STD 2025-05-07T19:46:34.0712665Z #define _PTRDIFF_T_DECLARED 2025-05-07T19:46:34.0712947Z #define __OFF_T_TYPE __SYSCALL_SLONG_TYPE 2025-05-07T19:46:34.0713297Z #define __W_STOPCODE(sig) ((sig) << 8 | 0x7f) 2025-05-07T19:46:34.0713726Z #define cudaStreamAttributePriority cudaLaunchAttributePriority 2025-05-07T19:46:34.0714166Z #define _GLIBCXX_HAVE_NETDB_H 1 2025-05-07T19:46:34.0714448Z #define __SM_20_INTRINSICS_HPP__ 2025-05-07T19:46:34.0714821Z #define __cpp_lib_has_unique_object_representations 201606 2025-05-07T19:46:34.0715204Z #define STA_PLL 0x0001 2025-05-07T19:46:34.0715459Z #define __ATOMIC_HLE_ACQUIRE 65536 2025-05-07T19:46:34.0715753Z #define __GNUG__ 11 2025-05-07T19:46:34.0715992Z #define _GLIBCXX_USE_GET_NPROCS 1 2025-05-07T19:46:34.0716440Z #define _T_WCHAR 2025-05-07T19:46:34.0716682Z #define __cudaCDP2GetDeviceCount 2025-05-07T19:46:34.0717004Z #define __specialization_static 2025-05-07T19:46:34.0717319Z #define __LONG_LONG_MAX__ 0x7fffffffffffffffLL 2025-05-07T19:46:34.0717654Z #define __SIZEOF_SIZE_T__ 8 2025-05-07T19:46:34.0717936Z #define cudaArraySparse 0x40 2025-05-07T19:46:34.0718202Z #define STA_PPSFREQ 0x0002 2025-05-07T19:46:34.0718496Z #define _IO_stdin ((_IO_FILE*)(&_IO_2_1_stdin_)) 2025-05-07T19:46:34.0718806Z #define _WCHAR_T 2025-05-07T19:46:34.0719044Z #define __cudaCDP2Free 2025-05-07T19:46:34.0719792Z #define __FD_ZERO(fdsp) do { int __d0, __d1; __asm__ __volatile__ ("cld; rep; " __FD_ZERO_STOS : "=c" (__d0), "=D" (__d1) : "a" (0), "0" (sizeof (fd_set) / sizeof (__fd_mask)), "1" (&__FDS_BITS (fdsp)[0]) : "memory"); } while (0) 2025-05-07T19:46:34.0720539Z #define __cpp_nsdmi 200809L 2025-05-07T19:46:34.0721069Z #define __glibcxx_min_b(T,B) (__glibcxx_signed_b (T,B) ? -__glibcxx_max_b (T,B) - 1 : (T)0) 2025-05-07T19:46:34.0721517Z #define __FLT64X_MIN_EXP__ (-16381) 2025-05-07T19:46:34.0721798Z #define __SIZEOF_WINT_T__ 4 2025-05-07T19:46:34.0722046Z #define cudaArrayCubemap 0x04 2025-05-07T19:46:34.0722375Z #define _PSTL_MONOTONIC_PRESENT (__INTEL_COMPILER >= 1800) 2025-05-07T19:46:34.0722711Z #define _GLIBCXX_UTILITY 1 2025-05-07T19:46:34.0722963Z #define __NO_CTYPE 1 2025-05-07T19:46:34.0723180Z #define __stub_bdflush 2025-05-07T19:46:34.0723541Z #define _GLIBCXX_MAKE_MOVE_ITERATOR(_Iter) std::make_move_iterator(_Iter) 2025-05-07T19:46:34.0723950Z #define __CORRECT_ISO_CPP_STRING_H_PROTO 2025-05-07T19:46:34.0724256Z #define _GLIBCXX_STDC_HEADERS 1 2025-05-07T19:46:34.0724522Z #define __LONG_LONG_WIDTH__ 64 2025-05-07T19:46:34.0724786Z #define __cpp_initializer_lists 200806L 2025-05-07T19:46:34.0725090Z #define _GLIBCXX_HAVE_NETINET_TCP_H 1 2025-05-07T19:46:34.0725368Z #define __U16_TYPE unsigned short int 2025-05-07T19:46:34.0725704Z #define __glibcxx_requires_can_increment(_First,_Size) 2025-05-07T19:46:34.0726039Z #define _GLIBCXX_HAVE_SYS_PARAM_H 1 2025-05-07T19:46:34.0726323Z #define __FLT32_MAX_EXP__ 128 2025-05-07T19:46:34.0726586Z #define cudaHostRegisterIoMemory 0x04 2025-05-07T19:46:34.0726924Z #define __FD_MASK(d) ((__fd_mask) 1 << ((d) % __NFDBITS)) 2025-05-07T19:46:34.0727271Z #define __cpp_lib_is_invocable 201703 2025-05-07T19:46:34.0727538Z #define _IO_STDIO 040000 2025-05-07T19:46:34.0727858Z #define _SIGSET_NWORDS (1024 / (8 * sizeof (unsigned long int))) 2025-05-07T19:46:34.0728225Z #define cudaSurfaceType1DLayered 0xF1 2025-05-07T19:46:34.0728535Z #define cudaArraySurfaceLoadStore 0x02 2025-05-07T19:46:34.0728810Z #define _PTRDIFF_T 2025-05-07T19:46:34.0729029Z #define _MOVE_H 1 2025-05-07T19:46:34.0729243Z #define __cpp_hex_float 201603L 2025-05-07T19:46:34.0729508Z #define ADJ_TAI 0x0080 2025-05-07T19:46:34.0729725Z #define __ptrvalue 2025-05-07T19:46:34.0729952Z #define _GLIBCXX_HOSTED 1 2025-05-07T19:46:34.0730202Z #define __GXX_ABI_VERSION 1016 2025-05-07T19:46:34.0730476Z #define __WTERMSIG(status) ((status) & 0x7f) 2025-05-07T19:46:34.0730783Z #define MATH_ERREXCEPT 2 2025-05-07T19:46:34.0731022Z #define _GLIBCXX_HAS_GTHREADS 1 2025-05-07T19:46:34.0731307Z #define cudaTextureType2DLayered 0xF2 2025-05-07T19:46:34.0731683Z #define __isleap(year) ((year) % 4 == 0 && ((year) % 100 != 0 || (year) % 400 == 0)) 2025-05-07T19:46:34.0732071Z #define __USE_GNU 1 2025-05-07T19:46:34.0732291Z #define __FLT128_HAS_INFINITY__ 1 2025-05-07T19:46:34.0732568Z #define __FLT_MIN_EXP__ (-125) 2025-05-07T19:46:34.0732824Z #define __GCC_HAVE_DWARF2_CFI_ASM 1 2025-05-07T19:46:34.0733208Z #define __FD_CLR(d,set) ((void) (__FDS_BITS (set)[__FD_ELT (d)] &= ~__FD_MASK (d))) 2025-05-07T19:46:34.0733598Z #define WEXITED 4 2025-05-07T19:46:34.0733807Z #define _IO_NO_READS 4 2025-05-07T19:46:34.0734100Z #define cudaGraphKernelNodePortLaunchCompletion 2 2025-05-07T19:46:34.0734425Z #define M_LOG2E 1.4426950408889634074 2025-05-07T19:46:34.0734708Z #define _POSIX_SYMLINK_MAX 255 2025-05-07T19:46:34.0735070Z #define _GLIBCXX_HAVE_BUILTIN_HAS_UNIQ_OBJ_REP 1 2025-05-07T19:46:34.0735387Z #define __uid_t_defined 2025-05-07T19:46:34.0735620Z #define __FD_ELT(d) ((d) / __NFDBITS) 2025-05-07T19:46:34.0735909Z #define _GLIBCXX_USE_STD_SPEC_FUNCS 1 2025-05-07T19:46:34.0736187Z #define WNOHANG 1 2025-05-07T19:46:34.0736417Z #define alloca(size) __builtin_alloca (size) 2025-05-07T19:46:34.0736726Z #define _GLIBCXX_HAVE_HYPOTF 1 2025-05-07T19:46:34.0736988Z #define cudaEventDefault 0x00 2025-05-07T19:46:34.0737287Z #define __maxnreg__(a) __attribute__((maxnreg(a))) 2025-05-07T19:46:34.0737598Z #define NL_SETMAX INT_MAX 2025-05-07T19:46:34.0737850Z #define __x86_64 1 2025-05-07T19:46:34.0738075Z #define __cudaCDP2LaunchDevice 2025-05-07T19:46:34.0738522Z #define __REDIRECT(name,proto,alias) name proto __asm__ (__ASMNAME (#alias)) 2025-05-07T19:46:34.0738985Z #define _GLIBCXX_BEGIN_NAMESPACE_CXX11 namespace __cxx11 { 2025-05-07T19:46:34.0739483Z #define __extern_always_inline extern __always_inline __attribute__ ((__gnu_inline__)) 2025-05-07T19:46:34.0739926Z #define __PTRDIFF_T 2025-05-07T19:46:34.0740233Z #define __exctype_l(name) extern int name (int, __locale_t) __THROW 2025-05-07T19:46:34.0740618Z #define _GLIBCXX_HAVE_FINITEL 1 2025-05-07T19:46:34.0740883Z #define __SM_35_ATOMIC_FUNCTIONS_H__ 2025-05-07T19:46:34.0741174Z #define _Mlong_double_ long double 2025-05-07T19:46:34.0741440Z #define __cpp_lambdas 200907L 2025-05-07T19:46:34.0741697Z #define _IO_DEC 020 2025-05-07T19:46:34.0741916Z #define _GLIBCXX_HAVE_SINHL 1 2025-05-07T19:46:34.0742189Z #define _POSIX_CLOCKRES_MIN 20000000 2025-05-07T19:46:34.0742484Z #define __INT_FAST64_TYPE__ long int 2025-05-07T19:46:34.0742755Z #define ADJ_TIMECONST 0x0020 2025-05-07T19:46:34.0743022Z #define _GLIBCXX_HAVE_SQRTL 1 2025-05-07T19:46:34.0743307Z #define __cudaCDP2DeviceGetSharedMemConfig 2025-05-07T19:46:34.0743929Z #define _GLIBCXX_HAVE_STDALIGN_H 1 2025-05-07T19:46:34.0744217Z #define _ANSI_STDDEF_H 2025-05-07T19:46:34.0744515Z #define _GLIBCXX_MOVE(__val) std::move(__val) 2025-05-07T19:46:34.0744856Z #define _GLIBCXX_HAVE_STRERROR_L 1 2025-05-07T19:46:34.0745261Z #define __FLT64_DENORM_MIN__ 4.94065645841246544176568792868221372e-324F64 2025-05-07T19:46:34.0745669Z #define _GLIBCXX_USE_DEV_RANDOM 1 2025-05-07T19:46:34.0745981Z #define _STL_ITERATOR_BASE_TYPES_H 1 2025-05-07T19:46:34.0746299Z #define __cpp_template_auto 201606L 2025-05-07T19:46:34.0746673Z #define __DBL_MIN__ double(2.22507385850720138309023271733240406e-308L) 2025-05-07T19:46:34.0747087Z #define _GLIBCXX_HAVE_SYS_SEM_H 1 2025-05-07T19:46:34.0747370Z #define __key_t_defined 2025-05-07T19:46:34.0747643Z #define _IO_MAGIC_MASK 0xFFFF0000 2025-05-07T19:46:34.0748027Z #define __cluster_dims__(...) __attribute__((cluster_dims(__VA_ARGS__))) 2025-05-07T19:46:34.0748550Z #define __FLT128_EPSILON__ 1.92592994438723585305597794258492732e-34F128 2025-05-07T19:46:34.0748941Z #define __GNUC_VA_LIST 2025-05-07T19:46:34.0749313Z #define __FLT64X_NORM_MAX__ 1.18973149535723176502126385303097021e+4932F64x 2025-05-07T19:46:34.0749743Z #define __SIZEOF_POINTER__ 8 2025-05-07T19:46:34.0750017Z #define CLOCK_REALTIME_COARSE 5 2025-05-07T19:46:34.0750316Z #define _GLIBCXX14_CONSTEXPR constexpr 2025-05-07T19:46:34.0750617Z #define __USE_XOPEN2KXSI 1 2025-05-07T19:46:34.0750881Z #define __WCOREFLAG 0x80 2025-05-07T19:46:34.0751134Z #define M_2_SQRTPI 1.12837916709551257390 2025-05-07T19:46:34.0751460Z #define cudaEventDisableTiming 0x02 2025-05-07T19:46:34.0751739Z #define __LP64__ 1 2025-05-07T19:46:34.0752001Z #define __isascii_l(c,l) ((l), __isascii (c)) 2025-05-07T19:46:34.0752339Z #define cudaStreamNonBlocking 0x01 2025-05-07T19:46:34.0752624Z #define _IO_off64_t __off64_t 2025-05-07T19:46:34.0752894Z #define __DBL_HAS_QUIET_NAN__ 1 2025-05-07T19:46:34.0753158Z #define __time_t_defined 1 2025-05-07T19:46:34.0753434Z #define _POSIX_SYMLOOP_MAX 8 2025-05-07T19:46:34.0753788Z #define __FLT32X_EPSILON__ 2.22044604925031308084726333618164062e-16F32x 2025-05-07T19:46:34.0754187Z #define __USE_UNIX98 1 2025-05-07T19:46:34.0754548Z #define __MODE_T_TYPE __U32_TYPE 2025-05-07T19:46:34.0754848Z #define CLOCK_REALTIME_ALARM 8 2025-05-07T19:46:34.0755115Z #define _GLIBCXX_HAVE_STRINGS_H 1 2025-05-07T19:46:34.0755418Z #define __LEAF_ATTR __attribute__ ((__leaf__)) 2025-05-07T19:46:34.0755727Z #define __DECIMAL_BID_FORMAT__ 1 2025-05-07T19:46:34.0755976Z #define SEEK_CUR 1 2025-05-07T19:46:34.0756308Z #define __RLIM64_T_TYPE __UQUAD_TYPE 2025-05-07T19:46:34.0756546Z #define _ASSERT_H 1 2025-05-07T19:46:34.0757092Z #define _PSTL_PRAGMA_DECLARE_REDUCTION(NAME,OP) _PSTL_PRAGMA(omp declare reduction(NAME:OP : omp_out(omp_in)) initializer(omp_priv = omp_orig)) 2025-05-07T19:46:34.0757692Z #define _GLIBCXX_USE_DEPRECATED 1 2025-05-07T19:46:34.0757946Z #define CHAR_MAX SCHAR_MAX 2025-05-07T19:46:34.0758252Z #define _GLIBCXX_HAVE_SETENV 1 2025-05-07T19:46:34.0758506Z #define NL_ARGMAX _POSIX_ARG_MAX 2025-05-07T19:46:34.0758759Z #define _GLIBCXX_USE_UTIMENSAT 1 2025-05-07T19:46:34.0759111Z #define __extern_inline extern __inline __attribute__ ((__gnu_inline__)) 2025-05-07T19:46:34.0759523Z #define _GLIBCXX_DEBUG_ONLY(_Statement) 2025-05-07T19:46:34.0760163Z #define _IO_putc_unlocked(_ch,_fp) (_IO_BE ((_fp)->_IO_write_ptr >= (_fp)->_IO_write_end, 0) ? __overflow (_fp, (unsigned char) (_ch)) : (unsigned char) (*(_fp)->_IO_write_ptr++ = (_ch))) 2025-05-07T19:46:34.0760804Z #define _GLIBCXX_HAVE_BUILTIN_LAUNDER 1 2025-05-07T19:46:34.0761079Z #define _IO_BOOLALPHA 0200000 2025-05-07T19:46:34.0761412Z #define _PSTL_CPP17_EXECUTION_POLICIES_PRESENT (_MSC_VER >= 1912) 2025-05-07T19:46:34.0761773Z #define _GLIBCXX_PACKAGE_URL "" 2025-05-07T19:46:34.0762017Z #define __FLT64_MIN_10_EXP__ (-307) 2025-05-07T19:46:34.0762283Z #define cudaArrayDefault 0x00 2025-05-07T19:46:34.0762537Z #define __cudaCDP2LaunchDeviceV2 2025-05-07T19:46:34.0762812Z #define __FDS_BITS(set) ((set)->fds_bits) 2025-05-07T19:46:34.0763075Z #define TLOSS 5 2025-05-07T19:46:34.0763274Z #define __ssize_t_defined 2025-05-07T19:46:34.0763505Z #define __CUDACC_VER_BUILD__ 61 2025-05-07T19:46:34.0763769Z #define ULONG_MAX (LONG_MAX * 2UL + 1UL) 2025-05-07T19:46:34.0764042Z #define __FLT64X_DECIMAL_DIG__ 21 2025-05-07T19:46:34.0764307Z #define _POSIX_HIWAT _POSIX_PIPE_BUF 2025-05-07T19:46:34.0764580Z #define __DEC128_MIN__ 1E-6143DL 2025-05-07T19:46:34.0764848Z #define __cudaCDP2EventRecordWithFlags 2025-05-07T19:46:34.0765296Z #define _GLIBCXX_ATOMIC_BUILTINS 1 2025-05-07T19:46:34.0765569Z #define cudaPeerAccessDefault 0x00 2025-05-07T19:46:34.0765835Z #define _GLIBCXX_HAVE_SYS_SOCKET_H 1 2025-05-07T19:46:34.0766085Z #define __REGISTER_PREFIX__ 2025-05-07T19:46:34.0766319Z #define __UINT16_MAX__ 0xffff 2025-05-07T19:46:34.0766620Z #define __glibcxx_requires_sorted_set(_First1,_Last1,_First2) 2025-05-07T19:46:34.0766956Z #define _IOS_NOREPLACE 64 2025-05-07T19:46:34.0767171Z #define __cdecl 2025-05-07T19:46:34.0767395Z #define cudaEventInterprocess 0x04 2025-05-07T19:46:34.0767701Z #define M_SQRT1_2l 0.707106781186547524400844362104849039L 2025-05-07T19:46:34.0768006Z #define LOGIN_NAME_MAX 256 2025-05-07T19:46:34.0768249Z #define _IO_TIED_PUT_GET 0x400 2025-05-07T19:46:34.0768490Z #define X_TLOSS 1.41484755040568800000e+16 2025-05-07T19:46:34.0768769Z #define CUDA_IPC_HANDLE_SIZE 64 2025-05-07T19:46:34.0769011Z #define __LDBL_HAS_INFINITY__ 1 2025-05-07T19:46:34.0769299Z #define __attribute_pure__ __attribute__ ((__pure__)) 2025-05-07T19:46:34.0769604Z #define __TEXTURE_TYPES_H__ 2025-05-07T19:46:34.0769988Z #define __NV_GLIBCXX_VERSION (__GNUC__ * 10000 + __GNUC_MINOR__ * 100 + __GNUC_PATCHLEVEL__) 2025-05-07T19:46:34.0770405Z #define ADJ_NANO 0x2000 2025-05-07T19:46:34.0770682Z #define __FLT32_MIN__ 1.17549435082228750796873653722224568e-38F32 2025-05-07T19:46:34.0771018Z #define __UINT8_TYPE__ unsigned char 2025-05-07T19:46:34.0771284Z #define _GLIBCXX_HAVE_ISWBLANK 1 2025-05-07T19:46:34.0771523Z #define __FLT_DIG__ 6 2025-05-07T19:46:34.0771846Z #define __REDIRECT_LDBL(name,proto,alias) __REDIRECT (name, proto, alias) 2025-05-07T19:46:34.0772222Z #define __NO_INLINE__ 1 2025-05-07T19:46:34.0772588Z #define _PSTL_EARLYEXIT_PRESENT (__INTEL_COMPILER >= 1800) 2025-05-07T19:46:34.0772925Z #define _POSIX_NGROUPS_MAX 8 2025-05-07T19:46:34.0773203Z #define ADJ_STATUS 0x0010 2025-05-07T19:46:34.0773450Z #define __cudaCDP2MemcpyAsync_ptsz 2025-05-07T19:46:34.0773748Z #define CLOCK_BOOTTIME_ALARM 9 2025-05-07T19:46:34.0774010Z #define LONG_LONG_MAX __LONG_LONG_MAX__ 2025-05-07T19:46:34.0774313Z #define _GLIBCXX_HAVE_OBSOLETE_ISNAN 1 2025-05-07T19:46:34.0774593Z #define __DEC_EVAL_METHOD__ 2 2025-05-07T19:46:34.0774975Z #define cudaStreamGraphFireAndForget (cudaStream_t)0x0200000000000000 2025-05-07T19:46:34.0775383Z #define _GLIBCXX_HAVE_ALIGNED_ALLOC 1 2025-05-07T19:46:34.0775743Z #define __DEC128_MAX__ 9.999999999999999999999999999999999E6144DL 2025-05-07T19:46:34.0776142Z #define CHAR_MIN SCHAR_MIN 2025-05-07T19:46:34.0776390Z #define MAX_CANON 255 2025-05-07T19:46:34.0776632Z #define __FLT_MANT_DIG__ 24 2025-05-07T19:46:34.0776878Z #define __LDBL_DECIMAL_DIG__ 21 2025-05-07T19:46:34.0777160Z #define _GLIBCXX_HAVE_COMPLEX_H 1 2025-05-07T19:46:34.0777437Z #define _PSTL_PRAGMA_VECTOR_UNALIGNED 2025-05-07T19:46:34.0777752Z #define _POSIX_FD_SETSIZE _POSIX_OPEN_MAX 2025-05-07T19:46:34.0778041Z #define _GLIBCXX_HAVE_HYPOT 1 2025-05-07T19:46:34.0778331Z #define __cudaCDP2Memset2DAsync_ptsz 2025-05-07T19:46:34.0778640Z #define _GLIBCXX_TR1_MODIFIED_BESSEL_FUNC_TCC 1 2025-05-07T19:46:34.0778963Z #define __VERSION__ "11.4.0" 2025-05-07T19:46:34.0779238Z #define _GLIBCXX11_USE_C99_STDLIB 1 2025-05-07T19:46:34.0779524Z #define cudaHostRegisterMapped 0x02 2025-05-07T19:46:34.0779823Z #define _GLIBCXX_HAVE_INT64_T 1 2025-05-07T19:46:34.0780093Z #define _GLIBCXX_USE_CONSTEXPR constexpr 2025-05-07T19:46:34.0780421Z #define FD_ZERO(fdsetp) __FD_ZERO (fdsetp) 2025-05-07T19:46:34.0780707Z #define __UINT64_C(c) c ## UL 2025-05-07T19:46:34.0780966Z #define MOD_OFFSET ADJ_OFFSET 2025-05-07T19:46:34.0781204Z #define _SYS_TYPES_H 1 2025-05-07T19:46:34.0781445Z #define AIO_PRIO_DELTA_MAX 20 2025-05-07T19:46:34.0781692Z #define _GLIBCXX_HAVE_TANHF 1 2025-05-07T19:46:34.0781941Z #define _SYS_CDEFS_H 1 2025-05-07T19:46:34.0782175Z #define _GLIBCXX_HAVE_TANHL 1 2025-05-07T19:46:34.0782432Z #define __cpp_unicode_characters 201411L 2025-05-07T19:46:34.0782723Z #define _IO_ERR_SEEN 0x20 2025-05-07T19:46:34.0782962Z #define _GLIBCXX_USE_DECIMAL_FLOAT 1 2025-05-07T19:46:34.0783254Z #define __cudaCDP2StreamDestroy 2025-05-07T19:46:34.0783622Z #define FP_SUBNORMAL 3 2025-05-07T19:46:34.0784061Z #define cudaOccupancyDefault 0x00 2025-05-07T19:46:34.0784348Z #define _INITIALIZER_LIST 2025-05-07T19:46:34.0784623Z #define _STDC_PREDEF_H 1 2025-05-07T19:46:34.0784975Z #define _GLIBCXX_PACKAGE_BUGREPORT "" 2025-05-07T19:46:34.0785290Z #define _GLIBCXX_HAVE_MODF 1 2025-05-07T19:46:34.0785575Z #define _IO_file_flags _flags 2025-05-07T19:46:34.0785840Z #define __USE_XOPEN2K8 1 2025-05-07T19:46:34.0786103Z #define htobe64(x) __bswap_64 (x) 2025-05-07T19:46:34.0786389Z #define _OLD_STDIO_MAGIC 0xFABC0000 2025-05-07T19:46:34.0786684Z #define HUGE 3.40282347e+38F 2025-05-07T19:46:34.0786963Z #define __cpp_lib_is_null_pointer 201309 2025-05-07T19:46:34.0787367Z #define WEXITSTATUS(status) __WEXITSTATUS (__WAIT_INT (status)) 2025-05-07T19:46:34.0787777Z #define islower_l(c,l) __islower_l ((c), (l)) 2025-05-07T19:46:34.0788113Z #define _GLIBCXX_USE_CXX11_ABI 1 2025-05-07T19:46:34.0788392Z #define _GLIBCXX_HAVE_SYMLINK 1 2025-05-07T19:46:34.0788674Z #define _BSD_SOURCE 1 2025-05-07T19:46:34.0788929Z #define _GLIBCXX_THROW(_EXC) 2025-05-07T19:46:34.0789834Z #define _GLIBCXX_HAS_NESTED_TYPE(_NTYPE) template> struct __has_ ##_NTYPE : false_type { }; template struct __has_ ##_NTYPE<_Tp, __void_t> : true_type { }; 2025-05-07T19:46:34.0790766Z #define __catch(X) catch(X) 2025-05-07T19:46:34.0791035Z #define __INT_LEAST32_MAX__ 0x7fffffff 2025-05-07T19:46:34.0791351Z #define LINE_MAX _POSIX2_LINE_MAX 2025-05-07T19:46:34.0791637Z #define __TIMER_T_TYPE void * 2025-05-07T19:46:34.0792004Z #define __STRING(x) #x 2025-05-07T19:46:34.0792255Z #define __GCC_ATOMIC_INT_LOCK_FREE 2 2025-05-07T19:46:34.0792558Z #define _T_PTRDIFF_ 2025-05-07T19:46:34.0792825Z #define _GLIBCXX_USE_NOEXCEPT noexcept 2025-05-07T19:46:34.0793134Z #define cudaEventWaitExternal 0x01 2025-05-07T19:46:34.0793431Z #define __unbounded 2025-05-07T19:46:34.0793676Z #define __DEVICE_ATOMIC_FUNCTIONS_H__ 2025-05-07T19:46:34.0793992Z #define __FLT128_MAX_EXP__ 16384 2025-05-07T19:46:34.0794278Z #define __INO_T_TYPE __SYSCALL_ULONG_TYPE 2025-05-07T19:46:34.0794607Z #define be16toh(x) __bswap_16 (x) 2025-05-07T19:46:34.0794891Z #define __cpp_lib_is_final 201402L 2025-05-07T19:46:34.0795209Z #define _GLIBCXX_BEGIN_NAMESPACE_CONTAINER 2025-05-07T19:46:34.0795564Z #define LONG_LONG_MIN (-LONG_LONG_MAX - 1LL) 2025-05-07T19:46:34.0795939Z #define __MATH_DECLARE_LDOUBLE 1 2025-05-07T19:46:34.0796346Z #define __managed__ __location__(managed) 2025-05-07T19:46:34.0796632Z #define _POSIX2_EXPR_NEST_MAX 32 2025-05-07T19:46:34.0797028Z #define __GNUC_PREREQ(maj,min) ((__GNUC__ << 16) + __GNUC_MINOR__ >= ((maj) << 16) + (min)) 2025-05-07T19:46:34.0797442Z #define _POSIX_STREAM_MAX 8 2025-05-07T19:46:34.0797703Z #define __LIBRARY_TYPES_H__ 2025-05-07T19:46:34.0798058Z #define _GLIBCXX_END_NAMESPACE_LDBL_OR_CXX11 _GLIBCXX_END_NAMESPACE_CXX11 2025-05-07T19:46:34.0798459Z #define __FLT32_MANT_DIG__ 24 2025-05-07T19:46:34.0798710Z #define _SYS_SIZE_T_H 2025-05-07T19:46:34.0798983Z #define _PSTL_VERSION_MINOR ((_PSTL_VERSION % 1000) / 10) 2025-05-07T19:46:34.0799326Z #define _GLIBCXX_STDLIB_H 1 2025-05-07T19:46:34.0799589Z #define isupper_l(c,l) __isupper_l ((c), (l)) 2025-05-07T19:46:34.0799889Z #define _CRTIMP 2025-05-07T19:46:34.0800100Z #define _GLIBCXX_CXX_CONFIG_H 1 2025-05-07T19:46:34.0800409Z #define __FLOAT_WORD_ORDER__ __ORDER_LITTLE_ENDIAN__ 2025-05-07T19:46:34.0800725Z #define STA_PPSJITTER 0x0200 2025-05-07T19:46:34.0801085Z #define _IO_feof_unlocked(__fp) (((__fp)->_flags & _IO_EOF_SEEN) != 0) 2025-05-07T19:46:34.0801485Z #define __SUSECONDS_T_TYPE __SYSCALL_SLONG_TYPE 2025-05-07T19:46:34.0801810Z #define _GLIBCXX_HAVE_ISINFF 1 2025-05-07T19:46:34.0802092Z #define __glibcxx_requires_subscript(_N) 2025-05-07T19:46:34.0802368Z #define __SIZE_T__ 2025-05-07T19:46:34.0802589Z #define __stub_gtty 2025-05-07T19:46:34.0802819Z #define __pid_t_defined 2025-05-07T19:46:34.0803102Z #define _GLIBCXX_FWDREF(_Tp) _Tp&& 2025-05-07T19:46:34.0803398Z #define __NLINK_T_TYPE __SYSCALL_ULONG_TYPE 2025-05-07T19:46:34.0803741Z #define __glibcxx_function_requires(...) 2025-05-07T19:46:34.0804035Z #define __SM_80_RT_HPP__ 2025-05-07T19:46:34.0804302Z #define __need_clockid_t 2025-05-07T19:46:34.0804553Z #define SSIZE_MAX LONG_MAX 2025-05-07T19:46:34.0804839Z #define _GLIBCXX_HAVE_USELOCALE 1 2025-05-07T19:46:34.0805192Z #define __glibcxx_requires_string_len(_String,_Len) 2025-05-07T19:46:34.0805520Z #define _IO_HEX 0100 2025-05-07T19:46:34.0805816Z #define __NFDBITS (8 * (int) sizeof (__fd_mask)) 2025-05-07T19:46:34.0806159Z #define cudaExternalMemoryDedicated 0x1 2025-05-07T19:46:34.0806298Z #define _GLIBCXX_HAVE_TGMATH_H 1 2025-05-07T19:46:34.0806411Z #define _GLIBCXX11_USE_C99_COMPLEX 1 2025-05-07T19:46:34.0806636Z #define _GLIBCXX17_DEPRECATED_SUGGEST(ALT) _GLIBCXX_DEPRECATED_SUGGEST(ALT) 2025-05-07T19:46:34.0806763Z #define ispunct_l(c,l) __ispunct_l ((c), (l)) 2025-05-07T19:46:34.0806902Z #define __cpp_aggregate_bases 201603L 2025-05-07T19:46:34.0807010Z #define __cudaGet_blockDim() blockDim 2025-05-07T19:46:34.0807121Z #define __cudaCDP2Memcpy3DAsync 2025-05-07T19:46:34.0807256Z #define __cudaCDP2MemcpyAsync 2025-05-07T19:46:34.0807354Z #define __stub_sstk 2025-05-07T19:46:34.0807458Z #define _IO_IN_BACKUP 0x100 2025-05-07T19:46:34.0807620Z #define _GLIBCXX_USE_C99_STDLIB _GLIBCXX11_USE_C99_STDLIB 2025-05-07T19:46:34.0807745Z #define __wur 2025-05-07T19:46:34.0807872Z #define isprint_l(c,l) __isprint_l ((c), (l)) 2025-05-07T19:46:34.0807969Z #define _G_HAVE_MMAP 1 2025-05-07T19:46:34.0808090Z #define _IO_OCT 040 2025-05-07T19:46:34.0808193Z #define __FLT128_HAS_DENORM__ 1 2025-05-07T19:46:34.0808369Z #define NL_MSGMAX INT_MAX 2025-05-07T19:46:34.0808472Z #define _GLIBCXX_USE_LFS 1 2025-05-07T19:46:34.0808802Z #define cudaDeviceScheduleBlockingSync 0x04 2025-05-07T19:46:34.0808940Z #define _POSIX_RTSIG_MAX 8 2025-05-07T19:46:34.0809221Z #define _GLIBCXX_NOEXCEPT noexcept 2025-05-07T19:46:34.0809469Z #define __glibcxx_requires_partitioned_lower(_First,_Last,_Value) 2025-05-07T19:46:34.0809583Z #define __FLT32_DECIMAL_DIG__ 9 2025-05-07T19:46:34.0809691Z #define _STL_ALGOBASE_H 1 2025-05-07T19:46:34.0809920Z #define __cudaCDP2MemsetAsync_ptsz 2025-05-07T19:46:34.0810055Z #define __off64_t_defined 2025-05-07T19:46:34.0810167Z #define _GLIBCXX_WEAK_DEFINITION 2025-05-07T19:46:34.0810321Z #define __FLT128_DIG__ 33 2025-05-07T19:46:34.0810568Z #define _GLIBCXX_USE_C99_INTTYPES_TR1 1 2025-05-07T19:46:34.0810672Z #define _GLIBCXX_HAVE_LOCALE_H 1 2025-05-07T19:46:34.0810765Z #define __INT32_C(c) c 2025-05-07T19:46:34.0810867Z #define __DEC64_EPSILON__ 1E-15DD 2025-05-07T19:46:34.0810996Z #define __ORDER_PDP_ENDIAN__ 3412 2025-05-07T19:46:34.0811097Z #define __DEC128_MIN_EXP__ (-6142) 2025-05-07T19:46:34.0811195Z #define __PDP_ENDIAN 3412 2025-05-07T19:46:34.0811305Z #define _ISOC95_SOURCE 1 2025-05-07T19:46:34.0811407Z #define _IO_fpos64_t _G_fpos64_t 2025-05-07T19:46:34.0811543Z #define M_PI_2l 1.570796326794896619231321691639751442L 2025-05-07T19:46:34.0811643Z #define BYTE_ORDER __BYTE_ORDER 2025-05-07T19:46:34.0811753Z #define __SM_90_RT_HPP__ 2025-05-07T19:46:34.0811854Z #define __INT_FAST32_TYPE__ long int 2025-05-07T19:46:34.0811953Z #define __have_pthread_attr_t 1 2025-05-07T19:46:34.0812069Z #define _GLIBCXX_HAVE_LIMIT_DATA 1 2025-05-07T19:46:34.0812301Z #define _GLIBCXX_BEGIN_NAMESPACE_LDBL_OR_CXX11 _GLIBCXX_BEGIN_NAMESPACE_CXX11 2025-05-07T19:46:34.0812412Z #define __cudaCDP2StreamWaitEvent 2025-05-07T19:46:34.0812533Z #define __cudaCDP2EventRecord 2025-05-07T19:46:34.0812630Z #define _BITS_TYPESIZES_H 1 2025-05-07T19:46:34.0812719Z #define htole32(x) (x) 2025-05-07T19:46:34.0812969Z #define __cudaCDP2OccupancyMaxActiveBlocksPerMultiprocessorWithFlags 2025-05-07T19:46:34.0813109Z #define __SYSCALL_SLONG_TYPE __SLONGWORD_TYPE 2025-05-07T19:46:34.0813220Z #define _GLIBCXX_USE_C99_MATH_TR1 1 2025-05-07T19:46:34.0813390Z #define WSTOPSIG(status) __WSTOPSIG (__WAIT_INT (status)) 2025-05-07T19:46:34.0813548Z #define _GLIBCXX_USE_C99_MATH _GLIBCXX11_USE_C99_MATH 2025-05-07T19:46:34.0813677Z #define __UINT_LEAST16_TYPE__ short unsigned int 2025-05-07T19:46:34.0813822Z #define __WIFEXITED(status) (__WTERMSIG(status) == 0) 2025-05-07T19:46:34.0813925Z #define ADJ_OFFSET 0x0001 2025-05-07T19:46:34.0814061Z #define cudaArrayLayered 0x01 2025-05-07T19:46:34.0814241Z #define _PSTL_ICC_18_OMP_SIMD_BROKEN (__INTEL_COMPILER == 1800) 2025-05-07T19:46:34.0814364Z #define cudaEventRecordDefault 0x00 2025-05-07T19:46:34.0814494Z #define _GLIBCXX_HAVE_FMODF 1 2025-05-07T19:46:34.0814605Z #define _PSTL_PRAGMA_MESSAGE(x) 2025-05-07T19:46:34.0814699Z #define unix 1 2025-05-07T19:46:34.0814811Z #define __DBL_HAS_DENORM__ 1 2025-05-07T19:46:34.0814941Z #define _POSIX_CHILD_MAX 25 2025-05-07T19:46:34.0815048Z #define _POSIX_MAX_INPUT 255 2025-05-07T19:46:34.0815173Z #define __cudaCDP2DeviceGetCacheConfig 2025-05-07T19:46:34.0815299Z #define __USE_POSIX 1 2025-05-07T19:46:34.0815412Z #define __FD_ZERO_STOS "stosq" 2025-05-07T19:46:34.0815556Z #define _PSTL_VERSION_MAJOR (_PSTL_VERSION / 1000) 2025-05-07T19:46:34.0815666Z #define __THROWNL throw () 2025-05-07T19:46:34.0815798Z #define __cpp_rtti 199711L 2025-05-07T19:46:34.0815915Z #define __SIZE_TYPE__ long unsigned int 2025-05-07T19:46:34.0816019Z #define __PMT(args) args 2025-05-07T19:46:34.0816168Z #define __UINT64_MAX__ 0xffffffffffffffffUL 2025-05-07T19:46:34.0816333Z #define __va_arg_pack_len() __builtin_va_arg_pack_len () 2025-05-07T19:46:34.0816459Z #define __ULONGWORD_TYPE unsigned long int 2025-05-07T19:46:34.0816561Z #define _SIZE_T_DECLARED 2025-05-07T19:46:34.0816691Z #define _PSTL_STRING_AUX(x) #x 2025-05-07T19:46:34.0816897Z #define __FLT_IS_IEC_60559__ 2 2025-05-07T19:46:34.0817324Z #define _PSTL_CPP14_MAKE_REVERSE_ITERATOR_PRESENT (_MSC_VER >= 1900 || __cplusplus >= 201402L || __cpp_lib_make_reverse_iterator == 201402) 2025-05-07T19:46:34.0836435Z #define _GLIBCXX_HAVE_LIMIT_AS 1 2025-05-07T19:46:34.0836581Z #define XATTR_LIST_MAX 65536 2025-05-07T19:46:34.0836672Z #define __CUDACC_VER_MAJOR__ 12 2025-05-07T19:46:34.0836817Z #define __GNUC_WIDE_EXECUTION_CHARSET_NAME "UTF-32LE" 2025-05-07T19:46:34.0836907Z #define _WCHAR_T_H 2025-05-07T19:46:34.0836990Z #define __FLT64X_DIG__ 18 2025-05-07T19:46:34.0837074Z #define _IO_SHOWBASE 0200 2025-05-07T19:46:34.0837164Z #define _POSIX_QLIMIT 1 2025-05-07T19:46:34.0837257Z #define __INT8_TYPE__ signed char 2025-05-07T19:46:34.0838620Z #define __SURFACE_TYPES_H__ 2025-05-07T19:46:34.0838717Z #define __CUDA_ARCH__ 520 2025-05-07T19:46:34.0838830Z #define __cpp_digit_separators 201309L 2025-05-07T19:46:34.0838910Z #define __ELF__ 1 2025-05-07T19:46:34.0839003Z #define CLOCK_THREAD_CPUTIME_ID 3 2025-05-07T19:46:34.0839110Z #define __GCC_ASM_FLAG_OUTPUTS__ 1 2025-05-07T19:46:34.0839189Z #define STA_INS 0x0010 2025-05-07T19:46:34.0839283Z #define __UINT32_TYPE__ unsigned int 2025-05-07T19:46:34.0839446Z #define _toupper(c) ((int) (*__ctype_toupper_loc ())[(int) (c)]) 2025-05-07T19:46:34.0839539Z #define _BITS_BYTESWAP_H 1 2025-05-07T19:46:34.0839627Z #define __ID_T_TYPE __U32_TYPE 2025-05-07T19:46:34.0839731Z #define __TIME_T_TYPE __SYSCALL_SLONG_TYPE 2025-05-07T19:46:34.0839837Z #define __DEVICE_DOUBLE_FUNCTIONS_HPP__ 2025-05-07T19:46:34.0839927Z #define _GLIBCXX_HAVE_MBSTATE_T 1 2025-05-07T19:46:34.0840023Z #define __cpp_lib_logical_traits 201510 2025-05-07T19:46:34.0840118Z #define ADJ_OFFSET_SS_READ 0xa001 2025-05-07T19:46:34.0840264Z #define __warnattr(msg) __attribute__((__warning__ (msg))) 2025-05-07T19:46:34.0840412Z #define _PSTL_PRAGMA_LOCATION " [Parallel STL message]: " 2025-05-07T19:46:34.0840503Z #define _IO_funlockfile(_fp) 2025-05-07T19:46:34.0840810Z #define cudaKernelNodeAttributeAccessPolicyWindow cudaLaunchAttributeAccessPolicyWindow 2025-05-07T19:46:34.0840932Z #define M_2_PIl 0.636619772367581343075535053490057448L 2025-05-07T19:46:34.0841020Z #define __DRIVER_TYPES_H__ 2025-05-07T19:46:34.0841107Z #define __FLT_RADIX__ 2 2025-05-07T19:46:34.0841200Z #define __INT_LEAST16_TYPE__ short int 2025-05-07T19:46:34.0841354Z #define __LDBL_EPSILON__ 1.08420217248550443400745280086994171e-19L 2025-05-07T19:46:34.0841442Z #define __UINTMAX_C(c) c ## UL 2025-05-07T19:46:34.0841538Z #define _GLIBCXX_USE_LSTAT 1 2025-05-07T19:46:34.0841634Z #define minor(dev) gnu_dev_minor (dev) 2025-05-07T19:46:34.0841721Z #define _POSIX_C_SOURCE 200809L 2025-05-07T19:46:34.0841821Z #define _GLIBCXX_HAVE_DIRENT_H 1 2025-05-07T19:46:34.0841917Z #define __GLIBCXX_BITSIZE_INT_N_0 128 2025-05-07T19:46:34.0841991Z #define WORD_BIT 32 2025-05-07T19:46:34.0842068Z #define _IO_USER_BUF 1 2025-05-07T19:46:34.0842163Z #define __VECTOR_TYPES_H__ 2025-05-07T19:46:34.0842254Z #define __SM_20_ATOMIC_FUNCTIONS_HPP__ 2025-05-07T19:46:34.0842354Z #define cudaHostAllocPortable 0x01 2025-05-07T19:46:34.0842453Z #define PTHREAD_STACK_MIN 16384 2025-05-07T19:46:34.0842539Z #define __long_double_t long double 2025-05-07T19:46:34.0842626Z #define _GLIBCXX_HAVE_ISINF 1 2025-05-07T19:46:34.0842709Z #define _POSIX_ARG_MAX 4096 2025-05-07T19:46:34.0843093Z #define cudaKernelNodeAttributeDeviceUpdatableKernelNode cudaLaunchAttributeDeviceUpdatableKernelNode 2025-05-07T19:46:34.0843166Z #define __k8 1 2025-05-07T19:46:34.0843345Z #define _GLIBCXX_NO_OBSOLETE_ISINF_ISNAN_DYNAMIC __GLIBC_PREREQ(2,23) 2025-05-07T19:46:34.0843518Z #define __FLT32X_MIN__ 2.22507385850720138309023271733240406e-308F32x 2025-05-07T19:46:34.0843623Z #define __LDBL_REDIR(name,proto) name proto 2025-05-07T19:46:34.0843719Z #define __SIG_ATOMIC_MAX__ 0x7fffffff 2025-05-07T19:46:34.0843820Z #define __SM_30_INTRINSICS_HPP__ 2025-05-07T19:46:34.0843911Z #define _GLIBCXX_EXTERN_TEMPLATE 1 2025-05-07T19:46:34.0843995Z #define __blksize_t_defined 2025-05-07T19:46:34.0844150Z #define _IO_SHOWPOINT 0400 2025-05-07T19:46:34.0844252Z #define _GLIBCXX_HAVE_LIMIT_RSS 1 2025-05-07T19:46:34.0844353Z #define cudaDeviceLmemResizeToMax 0x10 2025-05-07T19:46:34.0844438Z #define _GLIBCXX_X86_RDRAND 1 2025-05-07T19:46:34.0844545Z #define __GCC_ATOMIC_WCHAR_T_LOCK_FREE 2 2025-05-07T19:46:34.0844629Z #define _IO_IS_FILEBUF 0x2000 2025-05-07T19:46:34.0844713Z #define _GLIBCXX_USE_DUAL_ABI 1 2025-05-07T19:46:34.0844957Z #define __bswap_constant_16(x) ((unsigned short int) ((((x) >> 8) & 0xff) | (((x) & 0xff) << 8))) 2025-05-07T19:46:34.0845285Z #define cudaSignalExternalSemaphoresAsync __CUDART_API_PTSZ(cudaSignalExternalSemaphoresAsync_v2) 2025-05-07T19:46:34.0845374Z #define UCHAR_MAX (SCHAR_MAX * 2 + 1) 2025-05-07T19:46:34.0845516Z #define __SIZEOF_PTRDIFF_T__ 8 2025-05-07T19:46:34.0845600Z #define SEEK_SET 0 2025-05-07T19:46:34.0845693Z #define _GLIBCXX_TR1_GAMMA_TCC 1 2025-05-07T19:46:34.0845777Z #define __CUDA_API_VER_MINOR__ 8 2025-05-07T19:46:34.0845964Z #define _GLIBCXX_VISIBILITY(V) __attribute__ ((__visibility__ (#V))) 2025-05-07T19:46:34.0846064Z #define __cudaCDP2GetLastError 2025-05-07T19:46:34.0846150Z #define _GLIBCXX_HAVE_COSL 1 2025-05-07T19:46:34.0846234Z #define _MATH_H_MATHDEF 1 2025-05-07T19:46:34.0846560Z #define __bswap_constant_32(x) ((((x) & 0xff000000) >> 24) | (((x) & 0x00ff0000) >> 8) | (((x) & 0x0000ff00) << 8) | (((x) & 0x000000ff) << 24)) 2025-05-07T19:46:34.0846650Z #define _GLIBCXX_USE_FLOAT128 1 2025-05-07T19:46:34.0846737Z #define _IO_FLAGS2_NOTCANCEL 2 2025-05-07T19:46:34.0846830Z #define __stub_sigreturn 2025-05-07T19:46:34.0847060Z #define __errordecl(name,msg) extern void name (void) __attribute__((__error__ (msg))) 2025-05-07T19:46:34.0847151Z #define _GLIBCXX_HAVE_UTIME_H 1 2025-05-07T19:46:34.0847237Z #define __HOST_CONFIG_H__ 2025-05-07T19:46:34.0847336Z #define _XOPEN_SOURCE_EXTENDED 1 2025-05-07T19:46:34.0847417Z #define CLOCK_TAI 11 2025-05-07T19:46:34.0847515Z #define _GLIBCXX_END_NAMESPACE_VERSION 2025-05-07T19:46:34.0847719Z #define __glibcxx_requires_sorted_set_pred(_First1,_Last1,_First2,_Pred) 2025-05-07T19:46:34.0847799Z #define __restrict_arr 2025-05-07T19:46:34.0847902Z #define _PSTL_PRAGMA_MESSAGE_POLICIES(x) 2025-05-07T19:46:34.0848038Z #define __glibcxx_requires_valid_range(_First,_Last) 2025-05-07T19:46:34.0848551Z #define strndupa(s,n) (__extension__ ({ const char *__old = (s); size_t __len = strnlen (__old, (n)); char *__new = (char *) __builtin_alloca (__len + 1); __new[__len] = '\0'; (char *) memcpy (__new, __old, __len); })) 2025-05-07T19:46:34.0848723Z #define __attribute_artificial__ __attribute__ ((__artificial__)) 2025-05-07T19:46:34.0848801Z #define __USE_MISC 1 2025-05-07T19:46:34.0848903Z #define __UWORD_TYPE unsigned long int 2025-05-07T19:46:34.0848991Z #define _EXCEPTION_DEFINES_H 1 2025-05-07T19:46:34.0849072Z #define _GCC_LIMITS_H_ 2025-05-07T19:46:34.0849161Z #define __LDBL_DIG__ 18 2025-05-07T19:46:34.0849250Z #define __BIT_TYPES_DEFINED__ 1 2025-05-07T19:46:34.0849344Z #define __malloc_and_calloc_defined 2025-05-07T19:46:34.0849434Z #define __FLT64_IS_IEC_60559__ 2 2025-05-07T19:46:34.0849536Z #define _GLIBCXX_HAVE_SYS_SYSINFO_H 1 2025-05-07T19:46:34.0849613Z #define __x86_64__ 1 2025-05-07T19:46:34.0849689Z #define _SIZE_T_ 2025-05-07T19:46:34.0850586Z #define __bswap_constant_64(x) (__extension__ ((((x) & 0xff00000000000000ull) >> 56) | (((x) & 0x00ff000000000000ull) >> 40) | (((x) & 0x0000ff0000000000ull) >> 24) | (((x) & 0x000000ff00000000ull) >> 8) | (((x) & 0x00000000ff000000ull) << 8) | (((x) & 0x0000000000ff0000ull) << 24) | (((x) & 0x000000000000ff00ull) << 40) | (((x) & 0x00000000000000ffull) << 56))) 2025-05-07T19:46:34.0850679Z #define _POSIX2_COLL_WEIGHTS_MAX 2 2025-05-07T19:46:34.0850766Z #define __FLT32X_MIN_EXP__ (-1021) 2025-05-07T19:46:34.0850883Z #define __PTHREAD_RWLOCK_INT_FLAGS_SHARED 1 2025-05-07T19:46:34.0850996Z #define __DEC32_SUBNORMAL_MIN__ 0.000001E-95DF 2025-05-07T19:46:34.0851083Z #define _IO_iconv_t _G_iconv_t 2025-05-07T19:46:34.0851195Z #define _GLIBCXX_FLOAT_IS_IEEE_BINARY32 1 2025-05-07T19:46:34.0851365Z #define __cpp_lib_make_reverse_iterator 201402 2025-05-07T19:46:34.0851491Z #define _GLIBCXX_SYNCHRONIZATION_HAPPENS_BEFORE(A) 2025-05-07T19:46:34.0851583Z #define _GLIBCXX_HAVE_DLFCN_H 1 2025-05-07T19:46:34.0852050Z #define strdupa(s) (__extension__ ({ const char *__old = (s); size_t __len = strlen (__old) + 1; char *__new = (char *) __builtin_alloca (__len); (char *) memcpy (__new, __old, __len); })) 2025-05-07T19:46:34.0852162Z #define __no_return__ __attribute__((noreturn)) 2025-05-07T19:46:34.0852296Z #define __device_builtin__ __location__(device_builtin) 2025-05-07T19:46:34.0852401Z #define _PSTL_HIDE_FROM_ABI_POP 2025-05-07T19:46:34.0852487Z #define _GLIBCXX_HAVE_ACOSF 1 2025-05-07T19:46:34.0852571Z #define STA_FLL 0x0008 2025-05-07T19:46:34.0852765Z #define _GLIBCXX_HAVE_BUILTIN_IS_CONSTANT_EVALUATED 1 2025-05-07T19:46:34.0852856Z #define _GLIBCXX_END_EXTERN_C } 2025-05-07T19:46:34.0852969Z #define __INT_FAST16_MAX__ 0x7fffffffffffffffL 2025-05-07T19:46:34.0853074Z #define __cpp_lib_integer_sequence 201304 2025-05-07T19:46:34.0853164Z #define __stub_revoke 2025-05-07T19:46:34.0853246Z #define __timer_t_defined 1 2025-05-07T19:46:34.0853367Z #define _GLIBCXX11_DEPRECATED _GLIBCXX_DEPRECATED 2025-05-07T19:46:34.0853454Z #define INT_MAX __INT_MAX__ 2025-05-07T19:46:34.0853567Z #define ULLONG_MAX (LLONG_MAX * 2ULL + 1) 2025-05-07T19:46:34.0853660Z #define _GLIBCXX_END_NAMESPACE_CXX11 } 2025-05-07T19:46:34.0853745Z #define _GLIBCXX_ICONV_CONST 2025-05-07T19:46:34.0853848Z #define major(dev) gnu_dev_major (dev) 2025-05-07T19:46:34.0853949Z #define cudaArrayTextureGather 0x08 2025-05-07T19:46:34.0854045Z #define _GLIBCXX_LT_OBJDIR ".libs/" 2025-05-07T19:46:34.0854176Z #define __inline_hint__ __attribute__((nv_inline_hint)) 2025-05-07T19:46:34.0854274Z #define __NV_LEGACY_LAUNCH 1 2025-05-07T19:46:34.0854353Z #define _IO_off_t __off_t 2025-05-07T19:46:34.0854431Z #define __FLT64_DIG__ 15 2025-05-07T19:46:34.0854643Z #define PTHREAD_DESTRUCTOR_ITERATIONS _POSIX_THREAD_DESTRUCTOR_ITERATIONS 2025-05-07T19:46:34.0854731Z #define _POSIX2_LINE_MAX 2048 2025-05-07T19:46:34.0854846Z #define __UINT_FAST32_MAX__ 0xffffffffffffffffUL 2025-05-07T19:46:34.0854965Z #define __UINT_LEAST64_TYPE__ long unsigned int 2025-05-07T19:46:34.0855050Z #define ADJ_FREQUENCY 0x0002 2025-05-07T19:46:34.0855141Z #define __CUDART_API_PTDS(api) api 2025-05-07T19:46:34.0855213Z #define NULL __null 2025-05-07T19:46:34.0855338Z #define cudaStreamPerThread ((cudaStream_t)0x2) 2025-05-07T19:46:34.0855432Z #define _GLIBCXX_CONSTEXPR constexpr 2025-05-07T19:46:34.0855529Z #define __U64_TYPE unsigned long int 2025-05-07T19:46:34.0855631Z #define __FLT_HAS_QUIET_NAN__ 1 2025-05-07T19:46:34.0855715Z #define __FLT_MAX_10_EXP__ 38 2025-05-07T19:46:34.0855789Z #define FP_ZERO 2 2025-05-07T19:46:34.0855876Z #define _GLIBCXX_HAVE_FLOORL 1 2025-05-07T19:46:34.0856021Z #define __isgraph_l(c,l) __isctype_l((c), _ISgraph, (l)) 2025-05-07T19:46:34.0856123Z #define __LONG_MAX__ 0x7fffffffffffffffL 2025-05-07T19:46:34.0856203Z #define __WCHAR_T__ 2025-05-07T19:46:34.0856293Z #define __FLT64X_HAS_DENORM__ 1 2025-05-07T19:46:34.0856476Z #define __DEC128_SUBNORMAL_MIN__ 0.000000000000000000000000000000001E-6143DL 2025-05-07T19:46:34.0856612Z #define _GLIBCXX_NORETURN __attribute__ ((__noreturn__)) 2025-05-07T19:46:34.0856700Z #define __FLT_HAS_INFINITY__ 1 2025-05-07T19:46:34.0856821Z #define __GNUC_EXECUTION_CHARSET_NAME "UTF-8" 2025-05-07T19:46:34.0856928Z #define _GLIBCXX20_DEPRECATED_SUGGEST(ALT) 2025-05-07T19:46:34.0857050Z #define __WSTOPSIG(status) __WEXITSTATUS(status) 2025-05-07T19:46:34.0857179Z #define cudaSurfaceTypeCubemapLayered 0xFC 2025-05-07T19:46:34.0857264Z #define _BSD_PTRDIFF_T_ 2025-05-07T19:46:34.0857349Z #define _SIGSET_H_types 1 2025-05-07T19:46:34.0857456Z #define cudaTextureType1DLayered 0xF1 2025-05-07T19:46:34.0857569Z #define __cpp_unicode_literals 200710L 2025-05-07T19:46:34.0857709Z #define __isdigit_l(c,l) __isctype_l((c), _ISdigit, (l)) 2025-05-07T19:46:34.0857805Z #define __LONG_LONG_PAIR(HI,LO) LO, HI 2025-05-07T19:46:34.0857981Z #define __UINT_FAST16_TYPE__ long unsigned int 2025-05-07T19:46:34.0858099Z #define __bos0(ptr) __builtin_object_size (ptr, 0) 2025-05-07T19:46:34.0858199Z #define __DEC64_MAX__ 9.999999999999999E384DD 2025-05-07T19:46:34.0858326Z #define M_1_PIl 0.318309886183790671537767526745028724L 2025-05-07T19:46:34.0858424Z #define __CUDACC_DEVICE_ATOMIC_BUILTINS__ 1 2025-05-07T19:46:34.0858580Z #define WIFSTOPPED(status) __WIFSTOPPED (__WAIT_INT (status)) 2025-05-07T19:46:34.0858663Z #define __INT_FAST32_WIDTH__ 64 2025-05-07T19:46:34.0858759Z #define _POSIX2_CHARCLASS_NAME_MAX 14 2025-05-07T19:46:34.0858844Z #define _GLIBCXX_BITS_STD_ABS_H 2025-05-07T19:46:34.0858921Z #define STA_MODE 0x4000 2025-05-07T19:46:34.0859084Z #define __CHAR16_TYPE__ short unsigned int 2025-05-07T19:46:34.0859176Z #define __PRAGMA_REDEFINE_EXTNAME 1 2025-05-07T19:46:34.0859277Z #define __glibcxx_signed_b(T,B) ((T)(-1) < 0) 2025-05-07T19:46:34.0859366Z #define __USING_NAMESPACE_C99(name) 2025-05-07T19:46:34.0859460Z #define BIG_ENDIAN __BIG_ENDIAN 2025-05-07T19:46:34.0859550Z #define __cudaCDP2EventRecord_ptsz 2025-05-07T19:46:34.0859633Z #define _GLIBCXX_HAVE_SINL 1 2025-05-07T19:46:34.0859741Z #define EXPR_NEST_MAX _POSIX2_EXPR_NEST_MAX 2025-05-07T19:46:34.0859821Z #define __SIZE_WIDTH__ 64 2025-05-07T19:46:34.0859928Z #define __BLKSIZE_T_TYPE __SYSCALL_SLONG_TYPE 2025-05-07T19:46:34.0860001Z #define __SEG_FS 1 2025-05-07T19:46:34.0860093Z #define _IO_size_t size_t 2025-05-07T19:46:34.0860187Z #define __INT_LEAST16_MAX__ 0x7fff 2025-05-07T19:46:34.0860282Z #define INT_MIN (-INT_MAX - 1) 2025-05-07T19:46:34.0860523Z #define __stub_lchmod 2025-05-07T19:46:34.0860610Z #define __DEC64_MANT_DIG__ 16 2025-05-07T19:46:34.0860707Z #define __INT64_MAX__ 0x7fffffffffffffffL 2025-05-07T19:46:34.0860800Z #define _GLIBCXX_MANGLE_SIZE_T m 2025-05-07T19:46:34.0860883Z #define __SEG_GS 1 2025-05-07T19:46:34.0861057Z #define __FLT32_DENORM_MIN__ 1.40129846432481707092372958328991613e-45F32 2025-05-07T19:46:34.0861142Z #define _IOS_APPEND 8 2025-05-07T19:46:34.0861238Z #define __SIG_ATOMIC_WIDTH__ 32 2025-05-07T19:46:34.0861324Z #define _GLIBCXX_RELEASE 11 2025-05-07T19:46:34.0861414Z #define _GLIBCXX98_USE_C99_WCHAR 1 2025-05-07T19:46:34.0861503Z #define _IO_IS_APPENDING 0x1000 2025-05-07T19:46:34.0861608Z #define __INT_LEAST64_TYPE__ long int 2025-05-07T19:46:34.0861686Z #define htole16(x) (x) 2025-05-07T19:46:34.0861783Z #define __TEXTURE_INDIRECT_FUNCTIONS_H__ 2025-05-07T19:46:34.0861871Z #define _GLIBCXX_HAVE_FCNTL_H 1 2025-05-07T19:46:34.0861953Z #define __INT16_TYPE__ short int 2025-05-07T19:46:34.0862043Z #define __INT_LEAST8_TYPE__ signed char 2025-05-07T19:46:34.0862141Z #define __glibcxx_class_requires(_a,_b) 2025-05-07T19:46:34.0862236Z #define __cpp_structured_bindings 201606L 2025-05-07T19:46:34.0862348Z #define __align__(n) __attribute__((aligned(n))) 2025-05-07T19:46:34.0862426Z #define __SIZEOF_INT__ 4 2025-05-07T19:46:34.0862508Z #define __WCLONE 0x80000000 2025-05-07T19:46:34.0862589Z #define __DEC32_MAX_EXP__ 97 2025-05-07T19:46:34.0862666Z #define SEEK_HOLE 4 2025-05-07T19:46:34.0862748Z #define TIMER_ABSTIME 1 2025-05-07T19:46:34.0862830Z #define __INT_FAST8_MAX__ 0x7f 2025-05-07T19:46:34.0862909Z #define __CUDA_MATH_CRTIMP 2025-05-07T19:46:34.0863069Z #define __FLT128_MAX__ 1.18973149535723176508575932662800702e+4932F128 2025-05-07T19:46:34.0863174Z #define __INTPTR_MAX__ 0x7fffffffffffffffL 2025-05-07T19:46:34.0863258Z #define __DRIVER_FUNCTIONS_H__ 2025-05-07T19:46:34.0863458Z #define __cpp_sized_deallocation 201309L 2025-05-07T19:46:34.0863562Z #define __MATH_FUNCTIONS_HPP__ 2025-05-07T19:46:34.0863669Z #define __cpp_guaranteed_copy_elision 201606L 2025-05-07T19:46:34.0863919Z #define _LINUX_LIMITS_H 2025-05-07T19:46:34.0864003Z #define linux 1 2025-05-07T19:46:34.0864102Z #define MOD_MICRO ADJ_MICRO 2025-05-07T19:46:34.0864207Z #define _GLIBCXX_DEBUG_ASSERT(_Condition) 2025-05-07T19:46:34.0864299Z #define _GLIBCXX_HAVE_VSWSCANF 1 2025-05-07T19:46:34.0864404Z #define _GLIBCXX_HAVE_ISNAN 1 2025-05-07T19:46:34.0864602Z #define _XOPEN_IOV_MAX _POSIX_UIO_MAXIOV 2025-05-07T19:46:34.0864752Z #define __cudart_builtin__ __location__(cudart_builtin) 2025-05-07T19:46:34.0864848Z #define __cpp_lib_hypot 201603 2025-05-07T19:46:34.0864957Z #define __FLT64_HAS_QUIET_NAN__ 1 2025-05-07T19:46:34.0865053Z #define _GLIBCXX_HAVE_WCTYPE_H 1 2025-05-07T19:46:34.0865142Z #define MOD_NANO ADJ_NANO 2025-05-07T19:46:34.0865239Z #define htole64(x) (x) 2025-05-07T19:46:34.0865341Z #define FP_ILOGBNAN (-2147483647 - 1) 2025-05-07T19:46:34.0865464Z #define _IO_stdout ((_IO_FILE*)(&_IO_2_1_stdout_)) 2025-05-07T19:46:34.0865560Z #define _IO_UPPERCASE 01000 2025-05-07T19:46:34.0866072Z #define cudaKernelNodeAttributeClusterSchedulingPolicyPreference cudaLaunchAttributeClusterSchedulingPolicyPreference 2025-05-07T19:46:34.0866211Z #define __USE_POSIX2 1 2025-05-07T19:46:34.0866311Z #define MOD_ESTERROR ADJ_ESTERROR 2025-05-07T19:46:34.0866411Z #define __WALL 0x40000000 2025-05-07T19:46:34.0866503Z #define _GLIBCXX_HAVE_LDEXPF 1 2025-05-07T19:46:34.0866590Z #define _XLOCALE_H 1 2025-05-07T19:46:34.0866685Z #define _GLIBCXX_USE_TMPNAM 1 2025-05-07T19:46:34.0866792Z #define __FLT32_MIN_10_EXP__ (-37) 2025-05-07T19:46:34.0866881Z #define __KEY_T_TYPE __S32_TYPE 2025-05-07T19:46:34.0866981Z #define __cudaGet_threadIdx() threadIdx 2025-05-07T19:46:34.0867080Z #define __EXCEPTIONS 1 2025-05-07T19:46:34.0867179Z #define __CUDART_API_PTSZ(api) api 2025-05-07T19:46:34.0867371Z #define __launch_bounds__(...) __annotate__(launch_bounds(__VA_ARGS__)) 2025-05-07T19:46:34.0867469Z #define __WORDSIZE 64 2025-05-07T19:46:34.0867560Z #define CLOCK_MONOTONIC 1 2025-05-07T19:46:34.0867648Z #define _STL_RELOPS_H 1 2025-05-07T19:46:34.0867738Z #define __PTRDIFF_WIDTH__ 64 2025-05-07T19:46:34.0867849Z #define __BEGIN_DECLS extern "C" { 2025-05-07T19:46:34.0867946Z #define _GLIBCXX_HAVE_SYS_IPC_H 1 2025-05-07T19:46:34.0868034Z #define __LDBL_MANT_DIG__ 64 2025-05-07T19:46:34.0868140Z #define _GLIBCXX_HAVE_TRUNCATE 1 2025-05-07T19:46:34.0868442Z #define cudaKernelNodeAttributeClusterDimension cudaLaunchAttributeClusterDimension 2025-05-07T19:46:34.0868679Z #define _PSTL_GCC_VERSION (__GNUC__ * 10000 + __GNUC_MINOR__ * 100 + __GNUC_PATCHLEVEL__) 2025-05-07T19:46:34.0868799Z #define _GLIBCXX_NAMESPACE_CXX11 __cxx11:: 2025-05-07T19:46:34.0868902Z #define _GLIBCXX_NUMERIC_LIMITS 1 2025-05-07T19:46:34.0869002Z #define __cpp_range_based_for 201603L 2025-05-07T19:46:34.0869111Z #define __cpp_lib_exchange_function 201304 2025-05-07T19:46:34.0869217Z #define _GLIBCXX_HAVE_INTTYPES_H 1 2025-05-07T19:46:34.0869324Z #define _GLIBCXX_DARWIN_USE_64_BIT_INODE 1 2025-05-07T19:46:34.0869501Z #define cudaCooperativeLaunchMultiDeviceNoPostSync 0x02 2025-05-07T19:46:34.0869594Z #define __FLT64_HAS_INFINITY__ 1 2025-05-07T19:46:34.0869700Z #define _GLIBCXX_CSTDLIB 1 2025-05-07T19:46:34.0869804Z #define _GLIBCXX_DEBUG_MACRO_SWITCH_H 1 2025-05-07T19:46:34.0869980Z #define __FLT64X_MAX__ 1.18973149535723176502126385303097021e+4932F64x 2025-05-07T19:46:34.0870102Z #define __STDCPP_DEFAULT_NEW_ALIGNMENT__ 16 2025-05-07T19:46:34.0870191Z #define _STRING_H 1 2025-05-07T19:46:34.0870290Z #define _BITS_PTHREADTYPES_H 1 2025-05-07T19:46:34.0870392Z #define _GCC_MAX_ALIGN_T 2025-05-07T19:46:34.0870485Z #define __SM_32_INTRINSICS_HPP__ 2025-05-07T19:46:34.0870623Z #define __SIG_ATOMIC_MIN__ (-__SIG_ATOMIC_MAX__ - 1) 2025-05-07T19:46:34.0870719Z #define __code_model_small__ 1 2025-05-07T19:46:34.0870816Z #define _PSTL_CONFIG_H 2025-05-07T19:46:34.0870919Z #define __GCC_ATOMIC_LONG_LOCK_FREE 2 2025-05-07T19:46:34.0871032Z #define __cpp_nontype_template_args 201411L 2025-05-07T19:46:34.0871134Z #define __SM_20_INTRINSICS_H__ 2025-05-07T19:46:34.0871232Z #define cudaCpuDeviceId ((int)-1) 2025-05-07T19:46:34.0871586Z #define assert(expr) ((expr) ? __ASSERT_VOID_CAST (0) : __assert_fail (__STRING(expr), __FILE__, __LINE__, __ASSERT_FUNCTION)) 2025-05-07T19:46:34.0871683Z #define __DEC32_MANT_DIG__ 7 2025-05-07T19:46:34.0871779Z #define le64toh(x) (x) 2025-05-07T19:46:34.0871864Z #define FILENAME_MAX 4096 2025-05-07T19:46:34.0872072Z #define __iscntrl_l(c,l) __isctype_l((c), _IScntrl, (l)) 2025-05-07T19:46:34.0872192Z #define __cpp_return_type_deduction 201304L 2025-05-07T19:46:34.0872276Z #define L_cuserid 9 2025-05-07T19:46:34.0872363Z #define __ino_t_defined 2025-05-07T19:46:34.0872443Z #define __k8__ 1 2025-05-07T19:46:34.0872544Z #define __INTPTR_TYPE__ long int 2025-05-07T19:46:34.0872652Z #define __UINT16_TYPE__ short unsigned int 2025-05-07T19:46:34.0872741Z #define __int8_t_defined 2025-05-07T19:46:34.0872846Z #define __WCHAR_TYPE__ int 2025-05-07T19:46:34.0872942Z #define __CLOCKID_T_TYPE __S32_TYPE 2025-05-07T19:46:34.0873053Z #define cudaHostRegisterPortable 0x01 2025-05-07T19:46:34.0873148Z #define __SLONGWORD_TYPE long int 2025-05-07T19:46:34.0873330Z #define _GLIBCXX_PACKAGE_TARNAME "libstdc++" 2025-05-07T19:46:34.0873484Z #define __isblank_l(c,l) __isctype_l((c), _ISblank, (l)) 2025-05-07T19:46:34.0873571Z #define __HAVE_COLUMN 2025-05-07T19:46:34.0873666Z #define __stub_fdetach 2025-05-07T19:46:34.0874102Z #define __CUDACC_VER__ "__CUDACC_VER__ is no longer supported. Use __CUDACC_VER_MAJOR__, __CUDACC_VER_MINOR__, and __CUDACC_VER_BUILD__ instead." 2025-05-07T19:46:34.0874184Z #define __pic__ 2 2025-05-07T19:46:34.0874310Z #define __UINTPTR_MAX__ 0xffffffffffffffffUL 2025-05-07T19:46:34.0874407Z #define CLOCKS_PER_SEC 1000000l 2025-05-07T19:46:34.0874499Z #define __INT_FAST64_WIDTH__ 64 2025-05-07T19:46:34.0874600Z #define _GLIBCXX_HAVE_SOCKATMARK 1 2025-05-07T19:46:34.0874697Z #define __stub_chflags 2025-05-07T19:46:34.0874785Z #define CLOCK_BOOTTIME 7 2025-05-07T19:46:34.0874873Z #define __need_IOV_MAX 2025-05-07T19:46:34.0874991Z #define putc(_ch,_fp) _IO_putc (_ch, _fp) 2025-05-07T19:46:34.0875093Z #define __UQUAD_TYPE unsigned long int 2025-05-07T19:46:34.0875191Z #define __cpp_decltype 200707L 2025-05-07T19:46:34.0875292Z #define __BYTE_ORDER __LITTLE_ENDIAN 2025-05-07T19:46:34.0875398Z #define _GLIBCXX_USE_C99 1 2025-05-07T19:46:34.0875504Z #define _GLIBCXX_TR1_BETA_FUNCTION_TCC 1 2025-05-07T19:46:34.0875596Z #define TTY_NAME_MAX 32 2025-05-07T19:46:34.0875776Z #define _GLIBCXX_FORWARD(_Tp,__val) std::forward<_Tp>(__val) 2025-05-07T19:46:34.0875898Z #define __INT_FAST64_MAX__ 0x7fffffffffffffffL 2025-05-07T19:46:34.0876181Z #define _PSTL_ASSERT(_Condition) __glibcxx_assert(_Condition) 2025-05-07T19:46:34.0876290Z #define __GCC_ATOMIC_TEST_AND_SET_TRUEVAL 1 2025-05-07T19:46:34.0876387Z #define __LITTLE_ENDIAN 1234 2025-05-07T19:46:34.0876471Z #define STA_PPSTIME 0x0004 2025-05-07T19:46:34.0876549Z #define __import__ 2025-05-07T19:46:34.0876639Z #define BUFSIZ _IO_BUFSIZ 2025-05-07T19:46:34.0876764Z #define M_SQRT2l 1.414213562373095048801688724209698079L 2025-05-07T19:46:34.0876842Z #define __export__ 2025-05-07T19:46:34.0876956Z #define __FSID_T_TYPE struct { int __val[2]; } 2025-05-07T19:46:34.0877062Z #define cudaMemAttachHost 0x02 2025-05-07T19:46:34.0877220Z #define __FLT_NORM_MAX__ 3.40282346638528859811704183484516925e+38F 2025-05-07T19:46:34.0877308Z #define _GLIBCXX_HAVE_ICONV 1 2025-05-07T19:46:34.0877409Z #define _GLIBCXX_SYMVER 1 2025-05-07T19:46:34.0877498Z #define __FLT64X_MAX_EXP__ 16384 2025-05-07T19:46:34.0877579Z #define _WCHAR_T_DECLARED 2025-05-07T19:46:34.0877689Z #define __UINT_FAST64_TYPE__ long unsigned int 2025-05-07T19:46:34.0877808Z #define isalpha_l(c,l) __isalpha_l ((c), (l)) 2025-05-07T19:46:34.0877903Z #define __cpp_inline_variables 201606L 2025-05-07T19:46:34.0877987Z #define WNOWAIT 0x01000000 2025-05-07T19:46:34.0878074Z #define PLOSS 6 2025-05-07T19:46:34.0878158Z #define M_LN10 2.30258509299404568402 2025-05-07T19:46:34.0878409Z #define _PSTL_UDS_PRESENT (__INTEL_COMPILER >= 1900 && __INTEL_COMPILER_BUILD_DATE >= 20180626) 2025-05-07T19:46:34.0878501Z #define EXIT_SUCCESS 0 2025-05-07T19:46:34.0878596Z #define __LDBL_REDIR_DECL(name) 2025-05-07T19:46:34.0878687Z #define _GLIBCXX_HAVE_STRTOF 1 2025-05-07T19:46:34.0878777Z #define MOD_FREQUENCY ADJ_FREQUENCY 2025-05-07T19:46:34.0878870Z #define __thread__ __thread 2025-05-07T19:46:34.0878960Z #define _GLIBCXX_HAVE_MEMORY_H 1 2025-05-07T19:46:34.0879103Z #define __INT_MAX__ 0x7fffffff 2025-05-07T19:46:34.0879208Z #define __SIZEOF_PTHREAD_BARRIER_T 32 2025-05-07T19:46:34.0879425Z #define __glibcxx_requires_partitioned_upper_pred(_First,_Last,_Value,_Pred) 2025-05-07T19:46:34.0879529Z #define __cudaCDP2StreamWaitEvent_ptsz 2025-05-07T19:46:34.0879618Z #define _GLIBCXX_HAVE_SINF 1 2025-05-07T19:46:34.0879706Z #define __linux__ 1 2025-05-07T19:46:34.0879795Z #define STA_PPSSIGNAL 0x0100 2025-05-07T19:46:34.0879917Z #define M_LN2l 0.693147180559945309417232121458176568L 2025-05-07T19:46:34.0880007Z #define __S16_TYPE short int 2025-05-07T19:46:34.0880343Z #define __glibcxx_constexpr_assert(cond) if (__builtin_is_constant_evaluated() && !bool(cond)) __builtin_unreachable() 2025-05-07T19:46:34.0880489Z #define __NVCC_DIAG_PRAGMA_SUPPORT__ 1 2025-05-07T19:46:34.0880671Z #define __bos(ptr) __builtin_object_size (ptr, __USE_FORTIFY_LEVEL > 1) 2025-05-07T19:46:34.0880770Z #define __COMMON_FUNCTIONS_H__ 2025-05-07T19:46:34.0880864Z #define UINT_MAX (INT_MAX * 2U + 1U) 2025-05-07T19:46:34.0880940Z #define _T_SIZE_ 2025-05-07T19:46:34.0881040Z #define LLONG_MAX __LONG_LONG_MAX__ 2025-05-07T19:46:34.0881145Z #define __cudaCDP2StreamCreateWithFlags 2025-05-07T19:46:34.0881229Z #define _PSTL_VERSION 12000 2025-05-07T19:46:34.0881347Z #define __noinline__ __attribute__((noinline)) 2025-05-07T19:46:34.0881431Z #define __WNOTHREAD 0x20000000 2025-05-07T19:46:34.0881520Z #define _G_va_list __gnuc_va_list 2025-05-07T19:46:34.0881641Z #define M_PI_4l 0.785398163397448309615660845819875721L 2025-05-07T19:46:34.0881728Z #define _IOS_INPUT 1 2025-05-07T19:46:34.0881812Z #define __USE_LARGEFILE64 1 2025-05-07T19:46:34.0881908Z #define _GLIBCXX_TR1_EXP_INTEGRAL_TCC 1 2025-05-07T19:46:34.0882005Z #define __INT64_TYPE__ long int 2025-05-07T19:46:34.0882089Z #define _POSIX_SSIZE_MAX 32767 2025-05-07T19:46:34.0882181Z #define __shared__ __location__(shared) 2025-05-07T19:46:34.0882266Z #define __FLT_MAX_EXP__ 128 2025-05-07T19:46:34.0882418Z #define __glibc_unlikely(cond) __builtin_expect((cond), 0) 2025-05-07T19:46:34.0882501Z #define __gid_t_defined 2025-05-07T19:46:34.0882605Z #define _GLIBCXX_USE_SC_NPROCESSORS_ONLN 1 2025-05-07T19:46:34.0882704Z #define __ORDER_BIG_ENDIAN__ 4321 2025-05-07T19:46:34.0882887Z #define __glibcxx_requires_can_increment_range(_First1,_Last1,_First2) 2025-05-07T19:46:34.0882977Z #define _GLIBCXX17_INLINE inline 2025-05-07T19:46:34.0883059Z #define __DBL_MANT_DIG__ 53 2025-05-07T19:46:34.0883146Z #define ___int_size_t_h 2025-05-07T19:46:34.0883242Z #define __FSBLKCNT64_T_TYPE __UQUAD_TYPE 2025-05-07T19:46:34.0883356Z #define __cpp_inheriting_constructors 201511L 2025-05-07T19:46:34.0883506Z #define __WIFCONTINUED(status) ((status) == __W_CONTINUED) 2025-05-07T19:46:34.0883602Z #define CUDA_DOUBLE_MATH_FUNCTIONS 1 2025-05-07T19:46:34.0883694Z #define _GLIBCXX_HAVE_FENV_H 1 2025-05-07T19:46:34.0883785Z #define _GLIBCXX_HAVE_STDBOOL_H 1 2025-05-07T19:46:34.0883883Z #define __SIZEOF_FLOAT128__ 16 2025-05-07T19:46:34.0883999Z #define __INT_LEAST64_MAX__ 0x7fffffffffffffffL 2025-05-07T19:46:34.0884102Z #define _GLIBCXX_TR1_HYPERGEOMETRIC_TCC 1 2025-05-07T19:46:34.0884224Z #define _GLIBCXX_DEBUG_PEDASSERT(_Condition) 2025-05-07T19:46:34.0884308Z #define __clock_t_defined 1 2025-05-07T19:46:34.0884400Z #define _POSIX_SEM_VALUE_MAX 32767 2025-05-07T19:46:34.0884512Z #define __cudaCDP2RuntimeGetVersion 2025-05-07T19:46:34.0884598Z #define __GLIBC_MINOR__ 17 2025-05-07T19:46:34.0884685Z #define __DEC64_MIN__ 1E-383DD 2025-05-07T19:46:34.0884776Z #define __WINT_TYPE__ unsigned int 2025-05-07T19:46:34.0884888Z #define __UINT_LEAST32_TYPE__ unsigned int 2025-05-07T19:46:34.0884973Z #define __SIZEOF_SHORT__ 2 2025-05-07T19:46:34.0885138Z #define __FLT32_NORM_MAX__ 3.40282346638528859811704183484516925e+38F32 2025-05-07T19:46:34.0885217Z #define __SSE__ 1 2025-05-07T19:46:34.0885302Z #define SEM_VALUE_MAX (2147483647) 2025-05-07T19:46:34.0885389Z #define M_SQRT1_2 0.70710678118654752440 2025-05-07T19:46:34.0885468Z #define _CTYPE_H 1 2025-05-07T19:46:34.0885618Z #define __sigset_t_defined 2025-05-07T19:46:34.0885709Z #define __LDBL_MIN_EXP__ (-16381) 2025-05-07T19:46:34.0885795Z #define _GLIBCXX_HAVE_LOGF 1 2025-05-07T19:46:34.0885885Z #define MOD_TAI ADJ_TAI 2025-05-07T19:46:34.0885974Z #define _IO_va_list __gnuc_va_list 2025-05-07T19:46:34.0886058Z #define _GLIBCXX_HAVE_LOGL 1 2025-05-07T19:46:34.0886134Z #define __SM_70_RT_H__ 2025-05-07T19:46:34.0886234Z #define _GLIBCXX_HAVE_WRITEV 1 2025-05-07T19:46:34.0886327Z #define cudaEventWaitDefault 0x00 2025-05-07T19:46:34.0886413Z #define _GLIBCXX_HAVE_EXPL 1 2025-05-07T19:46:34.0886574Z #define __FLT64_MAX__ 1.79769313486231570814527423731704357e+308F64 2025-05-07T19:46:34.0886662Z #define _POSIX_MAX_CANON 255 2025-05-07T19:46:34.0886873Z #define _GLIBCXX_NOEXCEPT_PARM , bool _NE 2025-05-07T19:46:34.0886964Z #define FD_SETSIZE __FD_SETSIZE 2025-05-07T19:46:34.0887063Z #define _GLIBCXX_TXN_SAFE 2025-05-07T19:46:34.0887141Z #define __amd64__ 1 2025-05-07T19:46:34.0887229Z #define __WINT_WIDTH__ 32 2025-05-07T19:46:34.0887337Z #define __CUDA_DEVICE_RUNTIME_API_H__ 2025-05-07T19:46:34.0887596Z #define __REDIRECT_NTHNL(name,proto,alias) name proto __THROWNL __asm__ (__ASMNAME (#alias)) 2025-05-07T19:46:34.0887690Z #define _GLIBCXX_STDIO_SEEK_CUR 1 2025-05-07T19:46:34.0887766Z #define EOF (-1) 2025-05-07T19:46:34.0887867Z #define __WAIT_STATUS_DEFN void * 2025-05-07T19:46:34.0887954Z #define __USE_POSIX199309 1 2025-05-07T19:46:34.0888040Z #define __INT_LEAST64_WIDTH__ 64 2025-05-07T19:46:34.0888134Z #define __LDBL_MAX_EXP__ 16384 2025-05-07T19:46:34.0888218Z #define __FLT32X_MAX_10_EXP__ 308 2025-05-07T19:46:34.0888305Z #define LLONG_MIN (-LLONG_MAX-1) 2025-05-07T19:46:34.0888407Z #define cudaSurfaceType2DLayered 0xF2 2025-05-07T19:46:34.0888505Z #define ____mbstate_t_defined 1 2025-05-07T19:46:34.0888584Z #define STA_NANO 0x2000 2025-05-07T19:46:34.0888672Z #define _GLIBCXX_HAVE_LOG10F 1 2025-05-07T19:46:34.0888767Z #define _GLIBCXX_HAVE_LOG10L 1 2025-05-07T19:46:34.0888850Z #define _IO_LINKED 0x80 2025-05-07T19:46:34.0888942Z #define __cpp_lib_launder 201606 2025-05-07T19:46:34.0889027Z #define __SIZEOF_INT128__ 16 2025-05-07T19:46:34.0889136Z #define __PTHREAD_MUTEX_HAVE_PREV 1 2025-05-07T19:46:34.0889222Z #define __FLT64X_IS_IEC_60559__ 2 2025-05-07T19:46:34.0889312Z #define _GLIBCXX_TYPE_TRAITS 1 2025-05-07T19:46:34.0889445Z #define cudaGraphKernelNodePortProgrammatic 1 2025-05-07T19:46:34.0889546Z #define __DEVICE_ATOMIC_FUNCTIONS_HPP__ 2025-05-07T19:46:34.0889640Z #define __BLKCNT64_T_TYPE __SQUAD_TYPE 2025-05-07T19:46:34.0889731Z #define __LDBL_MAX_10_EXP__ 4932 2025-05-07T19:46:34.0889827Z #define __W_CONTINUED 0xffff 2025-05-07T19:46:34.0889915Z #define __ATOMIC_RELAXED 0 2025-05-07T19:46:34.0890036Z #define w_coredump __wait_terminated.__w_coredump 2025-05-07T19:46:34.0890161Z #define __FSBLKCNT_T_TYPE __SYSCALL_ULONG_TYPE 2025-05-07T19:46:34.0890343Z #define __cudaCDP2OccupancyMaxActiveBlocksPerMultiprocessor 2025-05-07T19:46:34.0890520Z #define __DBL_EPSILON__ double(2.22044604925031308084726333618164062e-16L) 2025-05-07T19:46:34.0890612Z #define __stub_stty 2025-05-07T19:46:34.0890768Z #define _tolower(c) ((int) (*__ctype_tolower_loc ())[(int) (c)]) 2025-05-07T19:46:34.0890849Z #define le16toh(x) (x) 2025-05-07T19:46:34.0890952Z #define BC_SCALE_MAX _POSIX2_BC_SCALE_MAX 2025-05-07T19:46:34.0891125Z #define __FLT128_MIN__ 3.36210314311209350626267781732175260e-4932F128 2025-05-07T19:46:34.0891203Z #define _SIZET_ 2025-05-07T19:46:34.0891290Z #define XATTR_NAME_MAX 255 2025-05-07T19:46:34.0891380Z #define _SVID_SOURCE 1 2025-05-07T19:46:34.0891456Z #define _LP64 1 2025-05-07T19:46:34.0891539Z #define _LIBC_LIMITS_H_ 1 2025-05-07T19:46:34.0891767Z #define __REDIRECT_NTH_LDBL(name,proto,alias) __REDIRECT_NTH (name, proto, alias) 2025-05-07T19:46:34.0891883Z #define _GLIBCXX_TR1_BESSEL_FUNCTION_TCC 1 2025-05-07T19:46:34.0891961Z #define __UINT8_C(c) c 2025-05-07T19:46:34.0892050Z #define _GLIBCXX_HAVE_CEILF 1 2025-05-07T19:46:34.0892149Z #define _GLIBCXX_HAVE_CEILL 1 2025-05-07T19:46:34.0892305Z #define __cudaCDP2Memset3DAsync_ptsz 2025-05-07T19:46:34.0892390Z #define __CUDA_ARCH_LIST__ 520 2025-05-07T19:46:34.0892477Z #define __FLT64_MAX_EXP__ 1024 2025-05-07T19:46:34.0892583Z #define MOD_MAXERROR ADJ_MAXERROR 2025-05-07T19:46:34.0892660Z #define CUDARTAPI 2025-05-07T19:46:34.0892740Z #define IOV_MAX 1024 2025-05-07T19:46:34.0892888Z #define __glibcxx_requires_irreflexive2(_First,_Last) 2025-05-07T19:46:34.0892978Z #define __INT_LEAST32_TYPE__ int 2025-05-07T19:46:34.0893067Z #define P_tmpdir "/tmp" 2025-05-07T19:46:34.0893166Z #define cudaMemAttachSingle 0x04 2025-05-07T19:46:34.0893257Z #define __wchar_t__ 2025-05-07T19:46:34.0893352Z #define __cpp_lib_is_aggregate 201703 2025-05-07T19:46:34.0893431Z #define SEEK_END 2 2025-05-07T19:46:34.0893589Z #define __SIZEOF_WCHAR_T__ 4 2025-05-07T19:46:34.0893750Z #define _GLIBCXX_USE_TBB_PAR_BACKEND __has_include() 2025-05-07T19:46:34.0893841Z #define _IO_ftrylockfile(_fp) 2025-05-07T19:46:34.0893978Z #define _GLIBCXX_USE_C99_WCHAR _GLIBCXX11_USE_C99_WCHAR 2025-05-07T19:46:34.0894080Z #define ____FILE_defined 1 2025-05-07T19:46:34.0894187Z #define _GLIBCXX_HAVE_BUILTIN_IS_AGGREGATE 1 2025-05-07T19:46:34.0894274Z #define __GNUC_PATCHLEVEL__ 0 2025-05-07T19:46:34.0894365Z #define _ISOC99_SOURCE 1 2025-05-07T19:46:34.0894455Z #define __VECTOR_FUNCTIONS_H__ 2025-05-07T19:46:34.0894690Z #define __REDIRECT_NTH(name,proto,alias) name proto __THROW __asm__ (__ASMNAME (#alias)) 2025-05-07T19:46:34.0894819Z #define _PSTL_USE_NONTEMPORAL_STORES_IF_ALLOWED 2025-05-07T19:46:34.0894897Z #define _IO_RIGHT 04 2025-05-07T19:46:34.0894985Z #define __END_NAMESPACE_STD 2025-05-07T19:46:34.0895162Z #define __FLT128_NORM_MAX__ 1.18973149535723176508575932662800702e+4932F128 2025-05-07T19:46:34.0895262Z #define _GLIBCXX_STD_C std 2025-05-07T19:46:34.0895369Z #define cudaInitDeviceFlagsAreValid 0x01 2025-05-07T19:46:34.0895455Z #define _LARGEFILE64_SOURCE 1 2025-05-07T19:46:34.0895558Z #define _GLIBCXX_USE_C99_STDINT_TR1 1 2025-05-07T19:46:34.0895639Z #define _STDDEF_H_ 2025-05-07T19:46:34.0895801Z #define __FLT64_NORM_MAX__ 1.79769313486231570814527423731704357e+308F64 2025-05-07T19:46:34.0895888Z #define __FLT128_HAS_QUIET_NAN__ 1 2025-05-07T19:46:34.0896005Z #define isalnum_l(c,l) __isalnum_l ((c), (l)) 2025-05-07T19:46:34.0896193Z #define __FD_ISSET(d,set) ((__FDS_BITS (set)[__FD_ELT (d)] & __FD_MASK (d)) != 0) 2025-05-07T19:46:34.0896295Z #define __INTMAX_MAX__ 0x7fffffffffffffffL 2025-05-07T19:46:34.0896440Z #define __glibcxx_requires_irreflexive(_First,_Last) 2025-05-07T19:46:34.0896550Z #define cudaGraphKernelNodePortDefault 0 2025-05-07T19:46:34.0896647Z #define __INT_FAST8_TYPE__ signed char 2025-05-07T19:46:34.0896747Z #define __cudaCDP2Memcpy3DAsync_ptsz 2025-05-07T19:46:34.0896848Z #define __PID_T_TYPE __S32_TYPE 2025-05-07T19:46:34.0896957Z #define __cpp_namespace_attributes 201411L 2025-05-07T19:46:34.0897046Z #define CHARCLASS_NAME_MAX 2048 2025-05-07T19:46:34.0897140Z #define _GLIBCXX_HAVE_TANF 1 2025-05-07T19:46:34.0897227Z #define _GLIBCXX_USE_ST_MTIM 1 2025-05-07T19:46:34.0897393Z #define __FLT64X_MIN__ 3.36210314311209350626267781732175260e-4932F64x 2025-05-07T19:46:34.0897492Z #define __CUDA_RUNTIME_H__ 2025-05-07T19:46:34.0897660Z #define WIFSIGNALED(status) __WIFSIGNALED (__WAIT_INT (status)) 2025-05-07T19:46:34.0897753Z #define _GLIBCXX_HAVE_STDLIB_H 1 2025-05-07T19:46:34.0897841Z #define __STDCPP_THREADS__ 1 2025-05-07T19:46:34.0897983Z #define M_2_SQRTPIl 1.128379167095512573896158903121545172L 2025-05-07T19:46:34.0898071Z #define __GNUC_STDC_INLINE__ 1 2025-05-07T19:46:34.0898156Z #define _POSIX_UIO_MAXIOV 16 2025-05-07T19:46:34.0898256Z #define _PSTL_PAR_BACKEND_SERIAL 2025-05-07T19:46:34.0898367Z #define __ASSERT_FUNCTION __PRETTY_FUNCTION__ 2025-05-07T19:46:34.0898459Z #define __FLT64_HAS_DENORM__ 1 2025-05-07T19:46:34.0898555Z #define __WORDSIZE_TIME64_COMPAT32 1 2025-05-07T19:46:34.0898703Z #define _GLIBCXX_DEPRECATED __attribute__ ((__deprecated__)) 2025-05-07T19:46:34.0898859Z #define __FLT32_EPSILON__ 1.19209289550781250000000000000000000e-7F32 2025-05-07T19:46:34.0899005Z #define _PSTL_HIDE_FROM_ABI_PUSH 2025-05-07T19:46:34.0899123Z #define cudaStreamLegacy ((cudaStream_t)0x1) 2025-05-07T19:46:34.0899227Z #define _IO_cleanup_region_start(_fct,_fp) 2025-05-07T19:46:34.0899323Z #define __location__(a) __annotate__(a) 2025-05-07T19:46:34.0899551Z #define __device_builtin_surface_type__ __location__(device_builtin_surface_type) 2025-05-07T19:46:34.0899642Z #define _POSIX2_BC_BASE_MAX 99 2025-05-07T19:46:34.0899743Z #define __cudaCDP2DeviceGetAttribute 2025-05-07T19:46:34.0899831Z #define __DBL_DECIMAL_DIG__ 17 2025-05-07T19:46:34.0899927Z #define __STDC_UTF_32__ 1 2025-05-07T19:46:34.0900011Z #define __INT_FAST8_WIDTH__ 8 2025-05-07T19:46:34.0900100Z #define NAN (__builtin_nanf ("")) 2025-05-07T19:46:34.0900247Z #define _POSIX_MQ_PRIO_MAX 32 2025-05-07T19:46:34.0900322Z #define __FXSR__ 1 2025-05-07T19:46:34.0900397Z #define _SIZE_T 2025-05-07T19:46:34.0900491Z #define _GLIBCXX_USE_GETTIMEOFDAY 1 2025-05-07T19:46:34.0900611Z #define cudaHostRegisterReadOnly 0x08 2025-05-07T19:46:34.0900769Z #define __FLT32X_MAX__ 1.79769313486231570814527423731704357e+308F32x 2025-05-07T19:46:34.0900904Z #define __WIFSTOPPED(status) (((status) & 0xff) == 0x7f) 2025-05-07T19:46:34.0901000Z #define _IO_ssize_t __ssize_t 2025-05-07T19:46:34.0901088Z #define __ULONG32_TYPE unsigned int 2025-05-07T19:46:34.0901260Z #define __DBL_NORM_MAX__ double(1.79769313486231570814527423731704357e+308L) 2025-05-07T19:46:34.0901455Z #define cudaStreamGraphTailLaunch (cudaStream_t)0x0100000000000000 2025-05-07T19:46:34.0901542Z #define _GXX_NULLPTR_T 2025-05-07T19:46:34.0901653Z #define __glibcxx_class_requires3(_a,_b,_c,_d) 2025-05-07T19:46:34.0901733Z #define FOPEN_MAX 16 2025-05-07T19:46:34.0901824Z #define __BIG_ENDIAN 4321 2025-05-07T19:46:34.0901939Z #define __BYTE_ORDER__ __ORDER_LITTLE_ENDIAN__ 2025-05-07T19:46:34.0902025Z #define __suseconds_t_defined 2025-05-07T19:46:34.0902115Z #define __off_t_defined 2025-05-07T19:46:34.0902194Z #define stderr stderr 2025-05-07T19:46:34.0902286Z #define M_LOG10E 0.43429448190325182765 2025-05-07T19:46:34.0902387Z #define __glibcxx_requires_string(_String) 2025-05-07T19:46:34.0902486Z #define _GLIBCXX_HAVE_LDEXPL 1 2025-05-07T19:46:34.0902572Z #define __INTMAX_WIDTH__ 64 2025-05-07T19:46:34.0902971Z #define _PSTL_CPP14_2RANGE_MISMATCH_EQUAL_PRESENT (_MSC_VER >= 1900 || __cplusplus >= 201300L || __cpp_lib_robust_nonmodifying_seq_ops == 201304) 2025-05-07T19:46:34.0903066Z #define __mode_t_defined 2025-05-07T19:46:34.0903144Z #define _GCC_SIZE_T 2025-05-07T19:46:34.0903235Z #define __INO64_T_TYPE __UQUAD_TYPE 2025-05-07T19:46:34.0903325Z #define __cpp_runtime_arrays 198712L 2025-05-07T19:46:34.0903533Z #define __UINT64_TYPE__ long unsigned int 2025-05-07T19:46:34.0903627Z #define __USE_XOPEN2K8XSI 1 2025-05-07T19:46:34.0903707Z #define __UINT32_C(c) c ## U 2025-05-07T19:46:34.0903989Z #define __cpp_alias_templates 200704L 2025-05-07T19:46:34.0904095Z #define cudaHostAllocMapped 0x02 2025-05-07T19:46:34.0904206Z #define __DEVICE_LAUNCH_PARAMETERS_H__ 2025-05-07T19:46:34.0904294Z #define _STL_ITERATOR_H 1 2025-05-07T19:46:34.0904385Z #define __size_t__ 2025-05-07T19:46:34.0904513Z #define cudaStreamAttrID cudaLaunchAttributeID 2025-05-07T19:46:34.0904607Z #define _GLIBCXX_HAVE_ATANF 1 2025-05-07T19:46:34.0904722Z #define cudaEventRecordExternal 0x01 2025-05-07T19:46:34.0904872Z #define __isspace_l(c,l) __isctype_l((c), _ISspace, (l)) 2025-05-07T19:46:34.0904966Z #define _IO_BUFSIZ _G_BUFSIZ 2025-05-07T19:46:34.0905145Z #define __FLT_DENORM_MIN__ 1.40129846432481707092372958328991613e-45F 2025-05-07T19:46:34.0905227Z #define _ENDIAN_H 1 2025-05-07T19:46:34.0905331Z #define __builtin_align__(a) __align__(a) 2025-05-07T19:46:34.0905423Z #define _GLIBCXX20_CONSTEXPR 2025-05-07T19:46:34.0905536Z #define __NV_NO_HOST_COMPILER_CHECK 1 2025-05-07T19:46:34.0905616Z #define __try try 2025-05-07T19:46:34.0905712Z #define _GLIBCXX_HAVE_FINITE 1 2025-05-07T19:46:34.0905815Z #define __FLT128_IS_IEC_60559__ 2 2025-05-07T19:46:34.0905974Z #define __INT8_MAX__ 0x7f 2025-05-07T19:46:34.0906236Z #define cudaStreamGetCaptureInfo __CUDART_API_PTSZ(cudaStreamGetCaptureInfo_v2) 2025-05-07T19:46:34.0906328Z #define __LONG_WIDTH__ 64 2025-05-07T19:46:34.0906412Z #define __PIC__ 2 2025-05-07T19:46:34.0906526Z #define BC_STRING_MAX _POSIX2_BC_STRING_MAX 2025-05-07T19:46:34.0906642Z #define __UINT_FAST32_TYPE__ long unsigned int 2025-05-07T19:46:34.0906785Z #define FD_ISSET(fd,fdsetp) __FD_ISSET (fd, fdsetp) 2025-05-07T19:46:34.0906879Z #define _GLIBCXX_HAVE_FLOAT_H 1 2025-05-07T19:46:34.0906970Z #define _GLIBCXX_HAVE_ATANL 1 2025-05-07T19:46:34.0907160Z #define __FLT32X_NORM_MAX__ 1.79769313486231570814527423731704357e+308F32x 2025-05-07T19:46:34.0907268Z #define __DEVICE_FUNCTIONS_HPP__ 2025-05-07T19:46:34.0907420Z #define __CHAR32_TYPE__ unsigned int 2025-05-07T19:46:34.0907509Z #define _IO_uid_t __uid_t 2025-05-07T19:46:34.0907617Z #define _GLIBCXX_HAVE_READLINK 1 2025-05-07T19:46:34.0907743Z #define __cudaCDP2EventRecordWithFlags_ptsz 2025-05-07T19:46:34.0907840Z #define _CONCEPT_CHECK_H 1 2025-05-07T19:46:34.0907991Z #define __FLT_MAX__ 3.40282346638528859811704183484516925e+38F 2025-05-07T19:46:34.0908099Z #define _GLIBCXX_HAVE_NETINET_IN_H 1 2025-05-07T19:46:34.0908219Z #define _GLIBCXX_TR1_SPECIAL_FUNCTION_UTIL_H 1 2025-05-07T19:46:34.0908303Z #define LONG_BIT 64 2025-05-07T19:46:34.0908420Z #define __SIZEOF_PTHREAD_BARRIERATTR_T 4 2025-05-07T19:46:34.0908518Z #define _GLIBCXX_USE_ALLOCATOR_NEW 1 2025-05-07T19:46:34.0908838Z #define __cpp_lib_math_special_functions 201603L 2025-05-07T19:46:34.0908962Z #define __fsfilcnt_t_defined 2025-05-07T19:46:34.0909052Z #define __blkcnt_t_defined 2025-05-07T19:46:34.0909323Z #define cudaKernelNodeAttributeMemSyncDomain cudaLaunchAttributeMemSyncDomain 2025-05-07T19:46:34.0909418Z #define __USE_LARGEFILE 1 2025-05-07T19:46:34.0909527Z #define __cpp_constexpr 201603L 2025-05-07T19:46:34.0909617Z #define CUDART_VERSION 12080 2025-05-07T19:46:34.0909707Z #define NL_TEXTMAX INT_MAX 2025-05-07T19:46:34.0909825Z #define cudaDeviceMapHost 0x08 2025-05-07T19:46:34.0909910Z #define _GLIBCXX_CMATH 1 2025-05-07T19:46:34.0910110Z #define __attribute_format_arg__(x) __attribute__ ((__format_arg__ (x))) 2025-05-07T19:46:34.0910200Z #define __lldiv_t_defined 1 2025-05-07T19:46:34.0910288Z #define __SSE2__ 1 2025-05-07T19:46:34.0910366Z #define _IOLBF 1 2025-05-07T19:46:34.0910463Z #define _GLIBCXX_HAVE_SYS_TYPES_H 1 2025-05-07T19:46:34.0910565Z #define _GLIBCXX_HAVE_FLOORF 1 2025-05-07T19:46:34.0910666Z #define __cpp_deduction_guides 201703L 2025-05-07T19:46:34.0910755Z #define _GLIBCXX_HAVE_EXPF 1 2025-05-07T19:46:34.0910860Z #define __annotate__(a) __attribute__((a)) 2025-05-07T19:46:34.0910957Z #define __INT32_TYPE__ int 2025-05-07T19:46:34.0911044Z #define __SIZEOF_DOUBLE__ 8 2025-05-07T19:46:34.0911146Z #define cudaDeviceSyncMemops 0x80 2025-05-07T19:46:34.0911252Z #define __cpp_exceptions 199711L 2025-05-07T19:46:34.0911346Z #define __FLT_MIN_10_EXP__ (-37) 2025-05-07T19:46:34.0911450Z #define cudaDeviceScheduleYield 0x02 2025-05-07T19:46:34.0911548Z #define _SYS_SYSMACROS_H 1 2025-05-07T19:46:34.0911677Z #define _GLIBCXX_TR1_LEGENDRE_FUNCTION_TCC 1 2025-05-07T19:46:34.0911837Z #define __FLT64_MIN__ 2.22507385850720138309023271733240406e-308F64 2025-05-07T19:46:34.0911928Z #define __INT_LEAST32_WIDTH__ 32 2025-05-07T19:46:34.0912034Z #define __SWORD_TYPE long int 2025-05-07T19:46:34.0912127Z #define __INTMAX_TYPE__ long int 2025-05-07T19:46:34.0912221Z #define _GLIBCXX11_USE_C99_MATH 1 2025-05-07T19:46:34.0912311Z #define __PTHREAD_SPINS 0, 0 2025-05-07T19:46:34.0912416Z #define _BITS_POSIX1_LIM_H 1 2025-05-07T19:46:34.0912699Z #define cudaStreamAttributeMemSyncDomainMap cudaLaunchAttributeMemSyncDomainMap 2025-05-07T19:46:34.0912789Z #define __DEC128_MAX_EXP__ 6145 2025-05-07T19:46:34.0912954Z #define math_errhandling (MATH_ERRNO | MATH_ERREXCEPT) 2025-05-07T19:46:34.0913037Z #define _T_SIZE 2025-05-07T19:46:34.0913137Z #define cudaHostAllocDefault 0x00 2025-05-07T19:46:34.0913257Z #define _PSTL_PRAGMA_SIMD_EXCLUSIVE_SCAN(PRM) 2025-05-07T19:46:34.0913517Z #define __va_arg_pack() __builtin_va_arg_pack () 2025-05-07T19:46:34.0913608Z #define _POSIX_TIMER_MAX 32 2025-05-07T19:46:34.0913694Z #define _GLIBCXX_HAVE_TLS 1 2025-05-07T19:46:34.0913820Z #define _GLIBCXX_NOTHROW _GLIBCXX_USE_NOEXCEPT 2025-05-07T19:46:34.0913918Z #define __FLT32X_HAS_QUIET_NAN__ 1 2025-05-07T19:46:34.0914009Z #define __ATOMIC_CONSUME 1 2025-05-07T19:46:34.0914192Z #define __CUDA_ARCH_HAS_FEATURE__(_FEAT) __CUDA_ARCH_FEAT_ ##_FEAT 2025-05-07T19:46:34.0914280Z #define __GNUC_MINOR__ 4 2025-05-07T19:46:34.0914380Z #define __GLIBCXX_TYPE_INT_N_0 __int128 2025-05-07T19:46:34.0914472Z #define __INT_FAST16_WIDTH__ 64 2025-05-07T19:46:34.0914595Z #define __UINTMAX_MAX__ 0xffffffffffffffffUL 2025-05-07T19:46:34.0914762Z #define __PIE__ 2 2025-05-07T19:46:34.0914865Z #define LITTLE_ENDIAN __LITTLE_ENDIAN 2025-05-07T19:46:34.0914972Z #define _GLIBCXX_HAVE_INT64_T_LONG 1 2025-05-07T19:46:34.0915172Z #define __FLT32X_DENORM_MIN__ 4.94065645841246544176568792868221372e-324F32x 2025-05-07T19:46:34.0915406Z #define __intN_t(N,MODE) typedef int int ##N ##_t __attribute__ ((__mode__ (MODE))) 2025-05-07T19:46:34.0915498Z #define __nlink_t_defined 2025-05-07T19:46:34.0915637Z #define _GLIBCXX17_DEPRECATED [[__deprecated__]] 2025-05-07T19:46:34.0915755Z #define _PSTL_STRING(x) _PSTL_STRING_AUX(x) 2025-05-07T19:46:34.0915844Z #define _XOPEN_LIM_H 1 2025-05-07T19:46:34.0916127Z #define __u_intN_t(N,MODE) typedef unsigned int u_int ##N ##_t __attribute__ ((__mode__ (MODE))) 2025-05-07T19:46:34.0916248Z #define __cpp_template_template_args 201611L 2025-05-07T19:46:34.0916355Z #define _GTHREAD_USE_MUTEX_TIMEDLOCK 1 2025-05-07T19:46:34.0916462Z #define BC_DIM_MAX _POSIX2_BC_DIM_MAX 2025-05-07T19:46:34.0916554Z #define __DBL_MAX_10_EXP__ 308 2025-05-07T19:46:34.0916647Z #define __FILE_defined 1 2025-05-07T19:46:34.0916830Z #define __LDBL_DENORM_MIN__ 3.64519953188247460252840593361941982e-4951L 2025-05-07T19:46:34.0916933Z #define _GLIBCXX_HAVE_SINCOS 1 2025-05-07T19:46:34.0917029Z #define __USE_XOPEN_EXTENDED 1 2025-05-07T19:46:34.0917138Z #define __cpp_lib_tuple_element_t 201402L 2025-05-07T19:46:34.0917260Z #define isascii_l(c,l) __isascii_l ((c), (l)) 2025-05-07T19:46:34.0917372Z #define cudaInvalidDeviceId ((int)-2) 2025-05-07T19:46:34.0917475Z #define _GLIBCXX_HAVE_SYS_RESOURCE_H 1 2025-05-07T19:46:34.0917558Z #define __INT16_C(c) c 2025-05-07T19:46:34.0917661Z #define __U32_TYPE unsigned int 2025-05-07T19:46:34.0917760Z #define _GLIBCXX_HAVE_SYS_IOCTL_H 1 2025-05-07T19:46:34.0917880Z #define FD_CLR(fd,fdsetp) __FD_CLR (fd, fdsetp) 2025-05-07T19:46:34.0917970Z #define __STDC__ 1 2025-05-07T19:46:34.0918063Z #define _GLIBCXX_HAVE_VWSCANF 1 2025-05-07T19:46:34.0918165Z #define _GLIBCXX_HAVE_EXECINFO_H 1 2025-05-07T19:46:34.0918261Z #define _GLIBCXX_USE_REALPATH 1 2025-05-07T19:46:34.0918421Z #define __attribute_malloc__ __attribute__ ((__malloc__)) 2025-05-07T19:46:34.0918512Z #define __FLT32X_DIG__ 15 2025-05-07T19:46:34.0918611Z #define _GLIBCXX_USE_C99_CTYPE_TR1 1 2025-05-07T19:46:34.0918719Z #define __PTRDIFF_TYPE__ long int 2025-05-07T19:46:34.0918826Z #define cudaArrayDeferredMapping 0x80 2025-05-07T19:46:34.0918934Z #define _GLIBCXX_END_NAMESPACE_CONTAINER 2025-05-07T19:46:34.0919034Z #define USHRT_MAX (SHRT_MAX * 2 + 1) 2025-05-07T19:46:34.0919144Z #define __cpp_lib_is_swappable 201603 2025-05-07T19:46:34.0919225Z #define stdin stdin 2025-05-07T19:46:34.0919315Z #define __ino64_t_defined 2025-05-07T19:46:34.0919408Z #define STA_CLK 0x8000 2025-05-07T19:46:34.0919498Z #define __clockid_t_defined 1 2025-05-07T19:46:34.0919645Z #define _GLIBCXX_NOEXCEPT_IF(...) noexcept(__VA_ARGS__) 2025-05-07T19:46:34.0919812Z #define __attribute_noinline__ __attribute__ ((__noinline__)) 2025-05-07T19:46:34.0919922Z #define __cudaCDP2MemsetAsync 2025-05-07T19:46:34.0920029Z #define _PSTL_PRAGMA_SIMD_SCAN(PRM) 2025-05-07T19:46:34.0920132Z #define _GLIBCXX_BEGIN_NAMESPACE_LDBL 2025-05-07T19:46:34.0920248Z #define _GLIBCXX_TR1_POLY_HERMITE_TCC 1 2025-05-07T19:46:34.0920448Z #define __FD_SET(d,set) ((void) (__FDS_BITS (set)[__FD_ELT (d)] |= __FD_MASK (d))) 2025-05-07T19:46:34.0920596Z #define __ATOMIC_SEQ_CST 5 2025-05-07T19:46:34.0921250Z #define __tobody(c,f,a,args) (__extension__ ({ int __res; if (sizeof (c) > 1) { if (__builtin_constant_p (c)) { int __c = (c); __res = __c < -128 || __c > 255 ? __c : (a)[__c]; } else __res = f args; } else __res = (a)[(int) (c)]; __res; })) 2025-05-07T19:46:34.0921328Z #define DOMAIN 1 2025-05-07T19:46:34.0921411Z #define M_LN2 0.69314718055994530942 2025-05-07T19:46:34.0921485Z #define __NVCC__ 1 2025-05-07T19:46:34.0921592Z #define __cudaCDP2Memset2DAsync 2025-05-07T19:46:34.0921695Z #define __CLOCK_T_TYPE __SYSCALL_SLONG_TYPE 2025-05-07T19:46:34.0921785Z #define _PSTL_PRAGMA_SIMD_EARLYEXIT 2025-05-07T19:46:34.0921945Z #define __throw_exception_again throw 2025-05-07T19:46:34.0922030Z #define M_SQRT2 1.41421356237309504880 2025-05-07T19:46:34.0922111Z #define __EXCEPTION_H 1 2025-05-07T19:46:34.0922209Z #define __FLT32X_MIN_10_EXP__ (-307) 2025-05-07T19:46:34.0922307Z #define HUGE_VAL (__builtin_huge_val()) 2025-05-07T19:46:34.0922584Z #define cudaStreamAttributeAccessPolicyWindow cudaLaunchAttributeAccessPolicyWindow 2025-05-07T19:46:34.0922685Z #define __UINTPTR_TYPE__ long unsigned int 2025-05-07T19:46:34.0922787Z #define _GLIBCXX_INLINE_VERSION 0 2025-05-07T19:46:34.0922872Z #define _GLIBCXX_USE_INT128 1 2025-05-07T19:46:34.0922965Z #define __cpp_lib_bool_constant 201505 2025-05-07T19:46:34.0923060Z #define PTHREAD_KEYS_MAX 1024 2025-05-07T19:46:34.0923189Z #define __DEC64_SUBNORMAL_MIN__ 0.000000000000001E-383DD 2025-05-07T19:46:34.0923284Z #define __FSFILCNT64_T_TYPE __UQUAD_TYPE 2025-05-07T19:46:34.0923383Z #define _GLIBCXX_DOUBLE_IS_IEEE_BINARY64 1 2025-05-07T19:46:34.0923477Z #define __DEC128_MANT_DIG__ 34 2025-05-07T19:46:34.0923572Z #define __cpp_lib_tuples_by_type 201304 2025-05-07T19:46:34.0923657Z #define __LDBL_MIN_10_EXP__ (-4931) 2025-05-07T19:46:34.0923760Z #define __cpp_generic_lambdas 201304L 2025-05-07T19:46:34.0923884Z #define _GLIBCXX_THROW_OR_ABORT(_EXC) (throw (_EXC)) 2025-05-07T19:46:34.0923971Z #define __useconds_t_defined 2025-05-07T19:46:34.0924060Z #define _GLIBCXX_USE_SCHED_YIELD 1 2025-05-07T19:46:34.0924238Z #define __attribute_deprecated__ __attribute__ ((__deprecated__)) 2025-05-07T19:46:34.0924372Z #define __cpp_lib_type_trait_variable_templates 201510L 2025-05-07T19:46:34.0924448Z #define __SSE_MATH__ 1 2025-05-07T19:46:34.0924536Z #define _IO_wint_t wint_t 2025-05-07T19:46:34.0924621Z #define __SIZEOF_LONG_LONG__ 8 2025-05-07T19:46:34.0924704Z #define _GLIBCXX_VERBOSE 1 2025-05-07T19:46:34.0924786Z #define _GLIBCXX_HAVE_ASINF 1 2025-05-07T19:46:34.0924895Z #define __cpp_user_defined_literals 200809L 2025-05-07T19:46:34.0924981Z #define _GLIBCXX_HAVE_ISINFL 1 2025-05-07T19:46:34.0925068Z #define _GLIBCXX_HAVE_ASINL 1 2025-05-07T19:46:34.0925149Z #define __USE_ATFILE 1 2025-05-07T19:46:34.0925230Z #define _POSIX_OPEN_MAX 20 2025-05-07T19:46:34.0925320Z #define _POSIX_LOGIN_NAME_MAX 9 2025-05-07T19:46:34.0925406Z #define _GCC_PTRDIFF_T 2025-05-07T19:46:34.0925616Z #define cudaKernelNodeAttributePriority cudaLaunchAttributePriority 2025-05-07T19:46:34.0925706Z #define __FLT128_DECIMAL_DIG__ 36 2025-05-07T19:46:34.0925795Z #define _POSIX_THREAD_KEYS_MAX 128 2025-05-07T19:46:34.0925897Z #define __GCC_ATOMIC_LLONG_LOCK_FREE 2 2025-05-07T19:46:34.0925995Z #define __cpp_lib_array_constexpr 201803L 2025-05-07T19:46:34.0926070Z #define _STDLIB_H 1 2025-05-07T19:46:34.0926203Z #define __exctype(name) extern int name (int) __THROW 2025-05-07T19:46:34.0926288Z #define __FLT32_HAS_QUIET_NAN__ 1 2025-05-07T19:46:34.0926373Z #define __FLT_DECIMAL_DIG__ 9 2025-05-07T19:46:34.0926491Z #define __UINT_FAST16_MAX__ 0xffffffffffffffffUL 2025-05-07T19:46:34.0926595Z #define __SURFACE_INDIRECT_FUNCTIONS_H__ 2025-05-07T19:46:34.0926687Z #define __SM_61_INTRINSICS_H__ 2025-05-07T19:46:34.0926858Z #define _GLIBCXX_PACKAGE_STRING "package-unused version-unused" 2025-05-07T19:46:34.0927009Z #define __isxdigit_l(c,l) __isctype_l((c), _ISxdigit, (l)) 2025-05-07T19:46:34.0927465Z #define __glibcxx_requires_nonempty() 2025-05-07T19:46:34.0927570Z #define w_stopsig __wait_stopped.__w_stopsig 2025-05-07T19:46:34.0927653Z #define __ldiv_t_defined 1 2025-05-07T19:46:34.0927825Z #define __glibcxx_requires_irreflexive_pred(_First,_Last,_Pred) 2025-05-07T19:46:34.0927909Z #define ___int_ptrdiff_t_h 2025-05-07T19:46:34.0928067Z #define __LDBL_NORM_MAX__ 1.18973149535723176502126385303097021e+4932L 2025-05-07T19:46:34.0928167Z #define __cudaCDP2EventDestroy 2025-05-07T19:46:34.0928249Z #define __HOST_DEFINES_H__ 2025-05-07T19:46:34.0928341Z #define __GCC_ATOMIC_SHORT_LOCK_FREE 2 2025-05-07T19:46:34.0928444Z #define __SM_20_ATOMIC_FUNCTIONS_H__ 2025-05-07T19:46:34.0928534Z #define _GLIBCXX_USE_NANOSLEEP 1 2025-05-07T19:46:34.0928660Z #define CUDART_CB 2025-05-07T19:46:34.0928756Z #define BC_BASE_MAX _POSIX2_BC_BASE_MAX 2025-05-07T19:46:34.0928877Z #define _GLIBCXX_USE_C99_INTTYPES_WCHAR_T_TR1 1 2025-05-07T19:46:34.0928962Z #define MB_LEN_MAX 16 2025-05-07T19:46:34.0929177Z #define __glibcxx_requires_partitioned_lower_pred(_First,_Last,_Value,_Pred) 2025-05-07T19:46:34.0929276Z #define _GLIBCXX11_USE_C99_WCHAR 1 2025-05-07T19:46:34.0929392Z #define _IO_peekc(_fp) _IO_peekc_unlocked (_fp) 2025-05-07T19:46:34.0929497Z #define _GLIBCXX_HAVE_AS_SYMVER_DIRECTIVE 1 2025-05-07T19:46:34.0929595Z #define _GLIBCXX_HAVE_UNISTD_H 1 2025-05-07T19:46:34.0929734Z #define __glibc_likely(cond) __builtin_expect((cond), 1) 2025-05-07T19:46:34.0929829Z #define __UINT_FAST8_TYPE__ unsigned char 2025-05-07T19:46:34.0929908Z #define _GNU_SOURCE 1 2025-05-07T19:46:34.0929998Z #define __stub_putmsg 2025-05-07T19:46:34.0930075Z #define __CUDACC__ 1 2025-05-07T19:46:34.0930160Z #define __N(msgid) (msgid) 2025-05-07T19:46:34.0930251Z #define __P(args) args 2025-05-07T19:46:34.0930489Z #define cudaKernelNodeAttributeCooperative cudaLaunchAttributeCooperative 2025-05-07T19:46:34.0930583Z #define __cpp_init_captures 201304L 2025-05-07T19:46:34.0930680Z #define _GLIBCXX17_CONSTEXPR constexpr 2025-05-07T19:46:34.0930778Z #define __ATOMIC_ACQ_REL 4 2025-05-07T19:46:34.0930868Z #define __cpp_lib_as_const 201510 2025-05-07T19:46:34.0930943Z #define __WCHAR_T 2025-05-07T19:46:34.0931032Z #define __ATOMIC_RELEASE 3 2025-05-07T19:46:34.0931116Z #define __fsblkcnt_t_defined 2025-05-07T19:46:34.0931220Z #define __cudaCDP2EventCreateWithFlags 2025-05-07T19:46:34.0931315Z #define __DEVICE_DOUBLE_FUNCTIONS_H__ 2025-05-07T19:46:34.0931333Z 2025-05-07T19:46:34.1154328Z 2025-05-07T19:46:34.1155120Z + conda run -n build_binary nvcc --version 2025-05-07T19:46:34.1155140Z 2025-05-07T19:46:35.9542232Z nvcc: NVIDIA (R) Cuda compiler driver 2025-05-07T19:46:35.9542635Z Copyright (c) 2005-2025 NVIDIA Corporation 2025-05-07T19:46:35.9542960Z Built on Wed_Jan_15_19:20:09_PST_2025 2025-05-07T19:46:35.9543422Z Cuda compilation tools, release 12.8, V12.8.61 2025-05-07T19:46:35.9543808Z Build cuda_12.8.r12.8/compiler.35404655_0 2025-05-07T19:46:35.9544032Z 2025-05-07T19:46:36.0299345Z 2025-05-07T19:46:36.0306286Z which: no nvidia-smi in (CONDA=/github/home/miniconda:/github/home/miniconda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin) 2025-05-07T19:46:36.0307179Z [CHECK] nvidia-smi not found 2025-05-07T19:46:36.0307514Z [INSTALL] Successfully installed CUDA 12.8.0 2025-05-07T19:46:36.0398895Z ##[group]Run . $PRELUDE; install_pytorch_pip $BUILD_ENV nightly cuda/12.8.0 2025-05-07T19:46:36.0399475Z . $PRELUDE; install_pytorch_pip $BUILD_ENV nightly cuda/12.8.0 2025-05-07T19:46:36.0400052Z shell: bash --noprofile --norc -e -o pipefail {0} 2025-05-07T19:46:36.0400392Z env: 2025-05-07T19:46:36.0400656Z PRELUDE: .github/scripts/setup_env.bash 2025-05-07T19:46:36.0400974Z BUILD_ENV: build_binary 2025-05-07T19:46:36.0401260Z BUILD_TARGET: genai 2025-05-07T19:46:36.0401508Z BUILD_VARIANT: cuda 2025-05-07T19:46:36.0401794Z BUILD_CUDA_VERSION: 12.8.0 2025-05-07T19:46:36.0402056Z ##[endgroup] 2025-05-07T19:46:36.4975645Z ################################################################################ 2025-05-07T19:46:36.4977255Z # Install PyTorch (PIP) 2025-05-07T19:46:36.4977970Z # 2025-05-07T19:46:36.4987962Z # [2025-05-07T19:46:36.498Z] + install_pytorch_pip build_binary nightly cuda/12.8.0 2025-05-07T19:46:36.4989454Z ################################################################################ 2025-05-07T19:46:36.4990134Z 2025-05-07T19:46:36.5026259Z [EXEC] [ATTEMPT 0/3] + conda install -n build_binary -c conda-forge --override-channels -y numpy 2025-05-07T19:46:37.4345803Z Channels: 2025-05-07T19:46:37.4346480Z - conda-forge 2025-05-07T19:46:37.4347121Z Platform: linux-64 2025-05-07T19:46:40.5092650Z Collecting package metadata (repodata.json): - \ | / done 2025-05-07T19:46:42.1446114Z Solving environment: \ | / - done 2025-05-07T19:46:42.4457130Z 2025-05-07T19:46:42.4457700Z ## Package Plan ## 2025-05-07T19:46:42.4458219Z 2025-05-07T19:46:42.4458812Z environment location: /github/home/miniconda/envs/build_binary 2025-05-07T19:46:42.4459771Z 2025-05-07T19:46:42.4460141Z added / updated specs: 2025-05-07T19:46:42.4460854Z - numpy 2025-05-07T19:46:42.4461200Z 2025-05-07T19:46:42.4461214Z 2025-05-07T19:46:42.4461604Z The following packages will be downloaded: 2025-05-07T19:46:42.4462266Z 2025-05-07T19:46:42.4462631Z package | build 2025-05-07T19:46:42.4463838Z ---------------------------|----------------- 2025-05-07T19:46:42.4464993Z libblas-3.9.0 |31_h59b9bed_openblas 16 KB conda-forge 2025-05-07T19:46:42.4466414Z libcblas-3.9.0 |31_he106b2a_openblas 16 KB conda-forge 2025-05-07T19:46:42.4467828Z liblapack-3.9.0 |31_h7ac8fdf_openblas 16 KB conda-forge 2025-05-07T19:46:42.4468529Z numpy-2.2.5 | py312h72c5963_0 8.1 MB conda-forge 2025-05-07T19:46:42.4468976Z ------------------------------------------------------------ 2025-05-07T19:46:42.4469347Z Total: 8.2 MB 2025-05-07T19:46:42.4469622Z 2025-05-07T19:46:42.4469764Z The following NEW packages will be INSTALLED: 2025-05-07T19:46:42.4470118Z 2025-05-07T19:46:42.4470372Z libblas conda-forge/linux-64::libblas-3.9.0-31_h59b9bed_openblas 2025-05-07T19:46:42.4470892Z libcblas conda-forge/linux-64::libcblas-3.9.0-31_he106b2a_openblas 2025-05-07T19:46:42.4471435Z liblapack conda-forge/linux-64::liblapack-3.9.0-31_h7ac8fdf_openblas 2025-05-07T19:46:42.4471920Z numpy conda-forge/linux-64::numpy-2.2.5-py312h72c5963_0 2025-05-07T19:46:42.4472220Z 2025-05-07T19:46:42.4472224Z 2025-05-07T19:46:42.4472229Z 2025-05-07T19:46:42.4472374Z Downloading and Extracting Packages: ...working... 2025-05-07T19:46:42.4472773Z numpy-2.2.5 | 8.1 MB | | 0% 2025-05-07T19:46:42.4473000Z 2025-05-07T19:46:42.4474346Z libblas-3.9.0 | 16 KB | | 0%  2025-05-07T19:46:42.4474731Z 2025-05-07T19:46:42.4475066Z 2025-05-07T19:46:42.4484246Z libcblas-3.9.0 | 16 KB | | 0%  2025-05-07T19:46:42.4485132Z 2025-05-07T19:46:42.4485145Z 2025-05-07T19:46:42.4485156Z 2025-05-07T19:46:42.7838752Z liblapack-3.9.0 | 16 KB | | 0%  2025-05-07T19:46:42.8255617Z numpy-2.2.5 | 8.1 MB | | 0% 2025-05-07T19:46:42.8255984Z 2025-05-07T19:46:42.8256710Z libblas-3.9.0 | 16 KB | #########7 | 97%  2025-05-07T19:46:42.8256989Z 2025-05-07T19:46:42.8320748Z libblas-3.9.0 | 16 KB | ########## | 100%  2025-05-07T19:46:42.8321539Z 2025-05-07T19:46:42.8321554Z 2025-05-07T19:46:42.8324846Z libcblas-3.9.0 | 16 KB | #########7 | 98%  2025-05-07T19:46:42.8325658Z 2025-05-07T19:46:42.8325670Z 2025-05-07T19:46:42.8440952Z libcblas-3.9.0 | 16 KB | ########## | 100%  2025-05-07T19:46:42.8441260Z 2025-05-07T19:46:42.8441408Z 2025-05-07T19:46:42.8441419Z 2025-05-07T19:46:42.8448909Z liblapack-3.9.0 | 16 KB | #########7 | 98%  2025-05-07T19:46:42.8449532Z 2025-05-07T19:46:42.8449546Z 2025-05-07T19:46:42.8450436Z 2025-05-07T19:46:42.8536199Z liblapack-3.9.0 | 16 KB | ########## | 100%  2025-05-07T19:46:42.8536522Z 2025-05-07T19:46:42.8696302Z libblas-3.9.0 | 16 KB | ########## | 100%  2025-05-07T19:46:42.8696651Z 2025-05-07T19:46:42.8696725Z 2025-05-07T19:46:42.8832495Z libcblas-3.9.0 | 16 KB | ########## | 100%  2025-05-07T19:46:42.8832824Z 2025-05-07T19:46:42.8832828Z 2025-05-07T19:46:42.8832832Z 2025-05-07T19:46:42.8920513Z liblapack-3.9.0 | 16 KB | ########## | 100%  2025-05-07T19:46:42.8921010Z numpy-2.2.5 | 8.1 MB | ########## | 100% 2025-05-07T19:46:43.2258528Z numpy-2.2.5 | 8.1 MB | ########## | 100% 2025-05-07T19:46:43.2262570Z numpy-2.2.5 | 8.1 MB | ########## | 100% 2025-05-07T19:46:43.2263809Z 2025-05-07T19:46:43.2264423Z 2025-05-07T19:46:43.2265057Z  2025-05-07T19:46:43.2265736Z 2025-05-07T19:46:43.2265748Z 2025-05-07T19:46:43.2266241Z  2025-05-07T19:46:43.2266865Z 2025-05-07T19:46:43.2266877Z 2025-05-07T19:46:43.2266930Z 2025-05-07T19:46:43.2267463Z  done 2025-05-07T19:46:43.3270607Z Preparing transaction: | done 2025-05-07T19:46:43.5284934Z Verifying transaction: - \ done 2025-05-07T19:46:43.6296044Z Executing transaction: / done 2025-05-07T19:46:43.7407189Z ################################################################################ 2025-05-07T19:46:43.7407862Z # Install Package From PyTorch PIP: torch 2025-05-07T19:46:43.7408190Z # 2025-05-07T19:46:43.7428062Z # [2025-05-07T19:46:43.742Z] + install_from_pytorch_pip build_binary torch nightly cuda/12.8.0 2025-05-07T19:46:43.7428915Z ################################################################################ 2025-05-07T19:46:43.7429295Z 2025-05-07T19:46:43.7445751Z [EXEC] [ATTEMPT 0/3] + wget -q --timeout 1 pypi.org -O /dev/null 2025-05-07T19:46:43.8304775Z [CHECK] Network does not appear to be blocked. 2025-05-07T19:46:43.8305392Z ################################################################################ 2025-05-07T19:46:43.8305764Z # Prepare PIP Arguments (PyTorch PIP) 2025-05-07T19:46:43.8306084Z # 2025-05-07T19:46:43.8322252Z # [2025-05-07T19:46:43.831Z] + __prepare_pip_arguments torch nightly cuda/12.8.0 2025-05-07T19:46:43.8323618Z ################################################################################ 2025-05-07T19:46:43.8324283Z 2025-05-07T19:46:43.8346237Z [INSTALL] Extracted package (channel, version): (nightly, LATEST) 2025-05-07T19:46:43.8369433Z [INSTALL] Extracted package variant: cu128 2025-05-07T19:46:43.8384047Z [INSTALL] Using a non-RELEASE channel: nightly ... 2025-05-07T19:46:43.8385677Z [INSTALL] Extracted the full PIP channel: https://download.pytorch.org/whl/nightly/cu128/ 2025-05-07T19:46:43.8388970Z [INSTALL] Extracted the full PIP package: --pre torch 2025-05-07T19:46:43.8396732Z [INSTALL] Attempting to install [torch, LATEST] from PyTorch PIP using channel https://download.pytorch.org/whl/nightly/cu128/ ... 2025-05-07T19:46:43.8418993Z [EXEC] [ATTEMPT 0/3] + conda run -n build_binary pip install --pre torch --index-url https://download.pytorch.org/whl/nightly/cu128/ 2025-05-07T19:48:37.7093294Z WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager, possibly rendering your system unusable. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv. Use the --root-user-action option if you know what you are doing and want to suppress this warning. 2025-05-07T19:48:37.7094873Z 2025-05-07T19:48:37.7095105Z Looking in indexes: https://download.pytorch.org/whl/nightly/cu128/ 2025-05-07T19:48:37.7095548Z Collecting torch 2025-05-07T19:48:37.7096270Z Downloading https://download.pytorch.org/whl/nightly/cu128/torch-2.8.0.dev20250507%2Bcu128-cp312-cp312-manylinux_2_28_x86_64.whl.metadata (30 kB) 2025-05-07T19:48:37.7099423Z Collecting filelock (from torch) 2025-05-07T19:48:37.7100001Z Downloading https://download.pytorch.org/whl/nightly/filelock-3.16.1-py3-none-any.whl (16 kB) 2025-05-07T19:48:37.7101003Z Requirement already satisfied: typing-extensions>=4.10.0 in /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages (from torch) (4.13.2) 2025-05-07T19:48:37.7102225Z Requirement already satisfied: setuptools in /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages (from torch) (78.1.1) 2025-05-07T19:48:37.7102968Z Collecting sympy>=1.13.3 (from torch) 2025-05-07T19:48:37.7103641Z Downloading https://download.pytorch.org/whl/nightly/sympy-1.13.3-py3-none-any.whl (6.2 MB) 2025-05-07T19:48:37.7104633Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 6.2/6.2 MB 42.0 MB/s eta 0:00:00 2025-05-07T19:48:37.7105030Z Collecting networkx (from torch) 2025-05-07T19:48:37.7105576Z Downloading https://download.pytorch.org/whl/nightly/networkx-3.4.2-py3-none-any.whl (1.7 MB) 2025-05-07T19:48:37.7106284Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.7/1.7 MB 11.6 MB/s eta 0:00:00 2025-05-07T19:48:37.7107036Z Requirement already satisfied: jinja2 in /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages (from torch) (3.1.6) 2025-05-07T19:48:37.7107745Z Collecting fsspec (from torch) 2025-05-07T19:48:37.7108264Z Downloading https://download.pytorch.org/whl/nightly/fsspec-2024.10.0-py3-none-any.whl (179 kB) 2025-05-07T19:48:37.7109106Z Collecting nvidia-cuda-nvrtc-cu12==12.8.61 (from torch) 2025-05-07T19:48:37.7110015Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_cuda_nvrtc_cu12-12.8.61-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl.metadata (1.7 kB) 2025-05-07T19:48:37.7110905Z Collecting nvidia-cuda-runtime-cu12==12.8.57 (from torch) 2025-05-07T19:48:37.7111822Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_cuda_runtime_cu12-12.8.57-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (1.7 kB) 2025-05-07T19:48:37.7112719Z Collecting nvidia-cuda-cupti-cu12==12.8.57 (from torch) 2025-05-07T19:48:37.7113613Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_cuda_cupti_cu12-12.8.57-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (1.7 kB) 2025-05-07T19:48:37.7114596Z Collecting nvidia-cudnn-cu12==9.8.0.87 (from torch) 2025-05-07T19:48:37.7115287Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_cudnn_cu12-9.8.0.87-py3-none-manylinux_2_27_x86_64.whl.metadata (1.8 kB) 2025-05-07T19:48:37.7116007Z Collecting nvidia-cublas-cu12==12.8.3.14 (from torch) 2025-05-07T19:48:37.7116713Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_cublas_cu12-12.8.3.14-py3-none-manylinux_2_27_x86_64.whl.metadata (1.7 kB) 2025-05-07T19:48:37.7117700Z Collecting nvidia-cufft-cu12==11.3.3.41 (from torch) 2025-05-07T19:48:37.7118540Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_cufft_cu12-11.3.3.41-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (1.5 kB) 2025-05-07T19:48:37.7119375Z Collecting nvidia-curand-cu12==10.3.9.55 (from torch) 2025-05-07T19:48:37.7120378Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_curand_cu12-10.3.9.55-py3-none-manylinux_2_27_x86_64.whl.metadata (1.5 kB) 2025-05-07T19:48:37.7121132Z Collecting nvidia-cusolver-cu12==11.7.2.55 (from torch) 2025-05-07T19:48:37.7121906Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_cusolver_cu12-11.7.2.55-py3-none-manylinux_2_27_x86_64.whl.metadata (1.6 kB) 2025-05-07T19:48:37.7122681Z Collecting nvidia-cusparse-cu12==12.5.7.53 (from torch) 2025-05-07T19:48:37.7123536Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_cusparse_cu12-12.5.7.53-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (1.6 kB) 2025-05-07T19:48:37.7124398Z Collecting nvidia-cusparselt-cu12==0.6.3 (from torch) 2025-05-07T19:48:37.7125252Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_cusparselt_cu12-0.6.3-py3-none-manylinux2014_x86_64.whl.metadata (6.8 kB) 2025-05-07T19:48:37.7126004Z Collecting nvidia-nccl-cu12==2.26.2 (from torch) 2025-05-07T19:48:37.7126834Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_nccl_cu12-2.26.2-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (2.0 kB) 2025-05-07T19:48:37.7127638Z Collecting nvidia-nvtx-cu12==12.8.55 (from torch) 2025-05-07T19:48:37.7128461Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_nvtx_cu12-12.8.55-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (1.6 kB) 2025-05-07T19:48:37.7129289Z Collecting nvidia-nvjitlink-cu12==12.8.61 (from torch) 2025-05-07T19:48:37.7130156Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_nvjitlink_cu12-12.8.61-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl.metadata (1.7 kB) 2025-05-07T19:48:37.7131015Z Collecting nvidia-cufile-cu12==1.13.0.11 (from torch) 2025-05-07T19:48:37.7131848Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_cufile_cu12-1.13.0.11-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (1.5 kB) 2025-05-07T19:48:37.7132718Z Collecting pytorch-triton==3.3.0+git96316ce5 (from torch) 2025-05-07T19:48:37.7133587Z Downloading https://download.pytorch.org/whl/nightly/pytorch_triton-3.3.0%2Bgit96316ce5-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (1.6 kB) 2025-05-07T19:48:37.7134471Z Collecting mpmath<1.4,>=1.1.0 (from sympy>=1.13.3->torch) 2025-05-07T19:48:37.7135062Z Downloading https://download.pytorch.org/whl/nightly/mpmath-1.3.0-py3-none-any.whl (536 kB) 2025-05-07T19:48:37.7135867Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 536.2/536.2 kB 3.2 MB/s eta 0:00:00 2025-05-07T19:48:37.7136629Z Requirement already satisfied: MarkupSafe>=2.0 in /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages (from jinja2->torch) (3.0.2) 2025-05-07T19:48:37.7137717Z Downloading https://download.pytorch.org/whl/nightly/cu128/torch-2.8.0.dev20250507%2Bcu128-cp312-cp312-manylinux_2_28_x86_64.whl (1047.0 MB) 2025-05-07T19:48:37.7138555Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.0/1.0 GB 24.9 MB/s eta 0:00:00 2025-05-07T19:48:37.7139268Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_cublas_cu12-12.8.3.14-py3-none-manylinux_2_27_x86_64.whl (609.6 MB) 2025-05-07T19:48:37.7140045Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 609.6/609.6 MB 44.2 MB/s eta 0:00:00 2025-05-07T19:48:37.7140842Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_cuda_cupti_cu12-12.8.57-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (10.2 MB) 2025-05-07T19:48:37.7141710Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 10.2/10.2 MB 60.1 MB/s eta 0:00:00 2025-05-07T19:48:37.7142502Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_cuda_nvrtc_cu12-12.8.61-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl (88.0 MB) 2025-05-07T19:48:37.7143481Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 88.0/88.0 MB 85.3 MB/s eta 0:00:00 2025-05-07T19:48:37.7144667Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_cuda_runtime_cu12-12.8.57-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (954 kB) 2025-05-07T19:48:37.7145627Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 954.8/954.8 kB 10.7 MB/s eta 0:00:00 2025-05-07T19:48:37.7146362Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_cudnn_cu12-9.8.0.87-py3-none-manylinux_2_27_x86_64.whl (698.0 MB) 2025-05-07T19:48:37.7147214Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 698.0/698.0 MB 33.5 MB/s eta 0:00:00 2025-05-07T19:48:37.7148060Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_cufft_cu12-11.3.3.41-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (193.1 MB) 2025-05-07T19:48:37.7148986Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 193.1/193.1 MB 65.2 MB/s eta 0:00:00 2025-05-07T19:48:37.7149829Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_cufile_cu12-1.13.0.11-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (1.2 MB) 2025-05-07T19:48:37.7150901Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.2/1.2 MB 10.6 MB/s eta 0:00:00 2025-05-07T19:48:37.7151597Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_curand_cu12-10.3.9.55-py3-none-manylinux_2_27_x86_64.whl (63.6 MB) 2025-05-07T19:48:37.7152382Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 63.6/63.6 MB 66.6 MB/s eta 0:00:00 2025-05-07T19:48:37.7153077Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_cusolver_cu12-11.7.2.55-py3-none-manylinux_2_27_x86_64.whl (260.4 MB) 2025-05-07T19:48:37.7153875Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 260.4/260.4 MB 63.7 MB/s eta 0:00:00 2025-05-07T19:48:37.7154669Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_cusparse_cu12-12.5.7.53-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (292.1 MB) 2025-05-07T19:48:37.7155535Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 292.1/292.1 MB 63.7 MB/s eta 0:00:00 2025-05-07T19:48:37.7156249Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_cusparselt_cu12-0.6.3-py3-none-manylinux2014_x86_64.whl (156.8 MB) 2025-05-07T19:48:37.7157041Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 156.8/156.8 MB 68.4 MB/s eta 0:00:00 2025-05-07T19:48:37.7157808Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_nccl_cu12-2.26.2-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (201.3 MB) 2025-05-07T19:48:37.7158662Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 201.3/201.3 MB 57.8 MB/s eta 0:00:00 2025-05-07T19:48:37.7159432Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_nvjitlink_cu12-12.8.61-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl (39.2 MB) 2025-05-07T19:48:37.7160303Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 39.2/39.2 MB 60.0 MB/s eta 0:00:00 2025-05-07T19:48:37.7161044Z Downloading https://download.pytorch.org/whl/nightly/cu128/nvidia_nvtx_cu12-12.8.55-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (89 kB) 2025-05-07T19:48:37.7162221Z Downloading https://download.pytorch.org/whl/nightly/pytorch_triton-3.3.0%2Bgit96316ce5-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (153.5 MB) 2025-05-07T19:48:37.7163174Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 153.5/153.5 MB 58.0 MB/s eta 0:00:00 2025-05-07T19:48:37.7164845Z Installing collected packages: nvidia-cusparselt-cu12, mpmath, sympy, pytorch-triton, nvidia-nvtx-cu12, nvidia-nvjitlink-cu12, nvidia-nccl-cu12, nvidia-curand-cu12, nvidia-cufile-cu12, nvidia-cuda-runtime-cu12, nvidia-cuda-nvrtc-cu12, nvidia-cuda-cupti-cu12, nvidia-cublas-cu12, networkx, fsspec, filelock, nvidia-cusparse-cu12, nvidia-cufft-cu12, nvidia-cudnn-cu12, nvidia-cusolver-cu12, torch 2025-05-07T19:48:37.7166382Z 2025-05-07T19:48:37.7168282Z Successfully installed filelock-3.16.1 fsspec-2024.10.0 mpmath-1.3.0 networkx-3.4.2 nvidia-cublas-cu12-12.8.3.14 nvidia-cuda-cupti-cu12-12.8.57 nvidia-cuda-nvrtc-cu12-12.8.61 nvidia-cuda-runtime-cu12-12.8.57 nvidia-cudnn-cu12-9.8.0.87 nvidia-cufft-cu12-11.3.3.41 nvidia-cufile-cu12-1.13.0.11 nvidia-curand-cu12-10.3.9.55 nvidia-cusolver-cu12-11.7.2.55 nvidia-cusparse-cu12-12.5.7.53 nvidia-cusparselt-cu12-0.6.3 nvidia-nccl-cu12-2.26.2 nvidia-nvjitlink-cu12-12.8.61 nvidia-nvtx-cu12-12.8.55 pytorch-triton-3.3.0+git96316ce5 sympy-1.13.3 torch-2.8.0.dev20250507+cu128 2025-05-07T19:48:37.7170333Z 2025-05-07T19:48:39.9024639Z torch 2.8.0.dev20250507+cu128 2025-05-07T19:48:39.9028830Z [CHECK] The installed package [torch, nightly/LATEST] is the correct variant (cu128) 2025-05-07T19:48:43.2748082Z [CHECK] Python (sub-)package 'torch.distributed' found ... 2025-05-07T19:48:46.6526675Z [CHECK] NOTE: The installed version is: 2.8.0.dev20250507+cu128 2025-05-07T19:48:46.6527171Z [CHECK] NOTE: Checking _GLIBCXX_USE_CXX11_ABI ... 2025-05-07T19:48:49.9590782Z True 2025-05-07T19:48:49.9591045Z True 2025-05-07T19:48:49.9591182Z 2025-05-07T19:48:50.0347401Z [INSTALL] Successfully installed PyTorch through PyTorch PIP 2025-05-07T19:48:50.0432069Z ##[group]Run if . $PRELUDE && which conda; then collect_pytorch_env_info $BUILD_ENV; fi 2025-05-07T19:48:50.0432740Z if . $PRELUDE && which conda; then collect_pytorch_env_info $BUILD_ENV; fi 2025-05-07T19:48:50.0433411Z shell: bash --noprofile --norc -e -o pipefail {0} 2025-05-07T19:48:50.0433758Z env: 2025-05-07T19:48:50.0433991Z PRELUDE: .github/scripts/setup_env.bash 2025-05-07T19:48:50.0434326Z BUILD_ENV: build_binary 2025-05-07T19:48:50.0434582Z BUILD_TARGET: genai 2025-05-07T19:48:50.0434832Z BUILD_VARIANT: cuda 2025-05-07T19:48:50.0435071Z BUILD_CUDA_VERSION: 12.8.0 2025-05-07T19:48:50.0435345Z ##[endgroup] 2025-05-07T19:48:50.4618469Z /github/home/miniconda/bin/conda 2025-05-07T19:48:50.4621749Z ################################################################################ 2025-05-07T19:48:50.4623039Z # Collect PyTorch Environment Information (for Reporting Issues) 2025-05-07T19:48:50.4624365Z # 2025-05-07T19:48:50.4645568Z # [2025-05-07T19:48:50.463Z] + collect_pytorch_env_info build_binary 2025-05-07T19:48:50.4646854Z ################################################################################ 2025-05-07T19:48:50.4647583Z 2025-05-07T19:48:50.4666299Z [EXEC] [ATTEMPT 0/3] + wget -q --timeout 1 pypi.org -O /dev/null 2025-05-07T19:48:50.5581955Z [CHECK] Network does not appear to be blocked. 2025-05-07T19:48:50.5586835Z [INFO] Downloading the PyTorch environment info collection script ... 2025-05-07T19:48:50.5587535Z + wget -q https://raw.githubusercontent.com/pytorch/pytorch/main/torch/utils/collect_env.py 2025-05-07T19:48:50.5587970Z 2025-05-07T19:48:50.6426091Z 2025-05-07T19:48:50.6427577Z [INFO] Collecting PyTorch environment info (will be needed for reporting issues to PyTorch) ... 2025-05-07T19:48:50.6455611Z [EXEC] [ATTEMPT 0/3] + conda run -n build_binary python collect_env.py 2025-05-07T19:48:56.1306075Z Collecting environment information... 2025-05-07T19:48:56.1306487Z PyTorch version: 2.8.0.dev20250507+cu128 2025-05-07T19:48:56.1306864Z Is debug build: False 2025-05-07T19:48:56.1307127Z CUDA used to build PyTorch: 12.8 2025-05-07T19:48:56.1307430Z ROCM used to build PyTorch: N/A 2025-05-07T19:48:56.1307614Z 2025-05-07T19:48:56.1307742Z OS: Amazon Linux 2023.7.20250428 (x86_64) 2025-05-07T19:48:56.1308074Z GCC version: (conda-forge gcc 11.4.0-13) 11.4.0 2025-05-07T19:48:56.1308419Z Clang version: Could not collect 2025-05-07T19:48:56.1308907Z CMake version: version 4.0.2 2025-05-07T19:48:56.1309199Z Libc version: glibc-2.34 2025-05-07T19:48:56.1309357Z 2025-05-07T19:48:56.1309678Z Python version: 3.12.2 | packaged by conda-forge | (main, Feb 16 2024, 20:50:58) [GCC 12.3.0] (64-bit runtime) 2025-05-07T19:48:56.1310342Z Python platform: Linux-6.1.130-139.222.amzn2023.x86_64-x86_64-with-glibc2.34 2025-05-07T19:48:56.1310773Z Is CUDA available: False 2025-05-07T19:48:56.1311045Z CUDA runtime version: 12.8.61 2025-05-07T19:48:56.1311329Z CUDA_MODULE_LOADING set to: N/A 2025-05-07T19:48:56.1311643Z GPU models and configuration: Could not collect 2025-05-07T19:48:56.1312302Z Nvidia driver version: Could not collect 2025-05-07T19:48:56.1312611Z cuDNN version: Could not collect 2025-05-07T19:48:56.1312901Z HIP runtime version: N/A 2025-05-07T19:48:56.1313217Z MIOpen runtime version: N/A 2025-05-07T19:48:56.1313479Z Is XNNPACK available: True 2025-05-07T19:48:56.1313642Z 2025-05-07T19:48:56.1313736Z CPU: 2025-05-07T19:48:56.1313949Z Architecture: x86_64 2025-05-07T19:48:56.1314307Z CPU op-mode(s): 32-bit, 64-bit 2025-05-07T19:48:56.1314714Z Address sizes: 46 bits physical, 48 bits virtual 2025-05-07T19:48:56.1315137Z Byte Order: Little Endian 2025-05-07T19:48:56.1315462Z CPU(s): 96 2025-05-07T19:48:56.1315777Z On-line CPU(s) list: 0-95 2025-05-07T19:48:56.1316113Z Vendor ID: GenuineIntel 2025-05-07T19:48:56.1316719Z Model name: Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz 2025-05-07T19:48:56.1317126Z CPU family: 6 2025-05-07T19:48:56.1317416Z Model: 85 2025-05-07T19:48:56.1317723Z Thread(s) per core: 2 2025-05-07T19:48:56.1318019Z Core(s) per socket: 24 2025-05-07T19:48:56.1318330Z Socket(s): 2 2025-05-07T19:48:56.1318617Z Stepping: 7 2025-05-07T19:48:56.1318939Z BogoMIPS: 5999.99 2025-05-07T19:48:56.1321431Z Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni 2025-05-07T19:48:56.1323769Z Hypervisor vendor: KVM 2025-05-07T19:48:56.1324064Z Virtualization type: full 2025-05-07T19:48:56.1324400Z L1d cache: 1.5 MiB (48 instances) 2025-05-07T19:48:56.1324754Z L1i cache: 1.5 MiB (48 instances) 2025-05-07T19:48:56.1325117Z L2 cache: 48 MiB (48 instances) 2025-05-07T19:48:56.1325458Z L3 cache: 71.5 MiB (2 instances) 2025-05-07T19:48:56.1325784Z NUMA node(s): 2 2025-05-07T19:48:56.1326092Z NUMA node0 CPU(s): 0-23,48-71 2025-05-07T19:48:56.1326408Z NUMA node1 CPU(s): 24-47,72-95 2025-05-07T19:48:56.1326858Z Vulnerability Gather data sampling: Unknown: Dependent on hypervisor status 2025-05-07T19:48:56.1327391Z Vulnerability Itlb multihit: KVM: Mitigation: VMX unsupported 2025-05-07T19:48:56.1327864Z Vulnerability L1tf: Mitigation; PTE Inversion 2025-05-07T19:48:56.1328431Z Vulnerability Mds: Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown 2025-05-07T19:48:56.1328990Z Vulnerability Meltdown: Mitigation; PTI 2025-05-07T19:48:56.1329576Z Vulnerability Mmio stale data: Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown 2025-05-07T19:48:56.1330139Z Vulnerability Reg file data sampling: Not affected 2025-05-07T19:48:56.1330509Z Vulnerability Retbleed: Vulnerable 2025-05-07T19:48:56.1330858Z Vulnerability Spec rstack overflow: Not affected 2025-05-07T19:48:56.1331218Z Vulnerability Spec store bypass: Vulnerable 2025-05-07T19:48:56.1331735Z Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization 2025-05-07T19:48:56.1332595Z Vulnerability Spectre v2: Mitigation; Retpolines; STIBP disabled; RSB filling; PBRSB-eIBRS Not affected; BHI Retpoline 2025-05-07T19:48:56.1333202Z Vulnerability Srbds: Not affected 2025-05-07T19:48:56.1333544Z Vulnerability Tsx async abort: Not affected 2025-05-07T19:48:56.1333767Z 2025-05-07T19:48:56.1333881Z Versions of relevant libraries: 2025-05-07T19:48:56.1334127Z [pip3] numpy==2.2.5 2025-05-07T19:48:56.1334368Z [pip3] nvidia-cublas-cu12==12.8.3.14 2025-05-07T19:48:56.1334653Z [pip3] nvidia-cuda-cupti-cu12==12.8.57 2025-05-07T19:48:56.1334955Z [pip3] nvidia-cuda-nvrtc-cu12==12.8.61 2025-05-07T19:48:56.1335245Z [pip3] nvidia-cuda-runtime-cu12==12.8.57 2025-05-07T19:48:56.1335548Z [pip3] nvidia-cudnn-cu12==9.8.0.87 2025-05-07T19:48:56.1335830Z [pip3] nvidia-cufft-cu12==11.3.3.41 2025-05-07T19:48:56.1336100Z [pip3] nvidia-curand-cu12==10.3.9.55 2025-05-07T19:48:56.1336394Z [pip3] nvidia-cusolver-cu12==11.7.2.55 2025-05-07T19:48:56.1336677Z [pip3] nvidia-cusparse-cu12==12.5.7.53 2025-05-07T19:48:56.1337076Z [pip3] nvidia-cusparselt-cu12==0.6.3 2025-05-07T19:48:56.1337357Z [pip3] nvidia-nccl-cu12==2.26.2 2025-05-07T19:48:56.1337641Z [pip3] nvidia-nvjitlink-cu12==12.8.61 2025-05-07T19:48:56.1337923Z [pip3] nvidia-nvtx-cu12==12.8.55 2025-05-07T19:48:56.1338211Z [pip3] pytorch-triton==3.3.0+git96316ce5 2025-05-07T19:48:56.1338512Z [pip3] torch==2.8.0.dev20250507+cu128 2025-05-07T19:48:56.1338864Z [conda] cuda-cudart 12.8.57 h5888daf_1 conda-forge 2025-05-07T19:48:56.1339345Z [conda] cuda-cudart-dev 12.8.57 h5888daf_1 conda-forge 2025-05-07T19:48:56.1339839Z [conda] cuda-cudart-dev_linux-64 12.8.57 h3f2d84a_1 conda-forge 2025-05-07T19:48:56.1340359Z [conda] cuda-cudart-static 12.8.57 h5888daf_1 conda-forge 2025-05-07T19:48:56.1340877Z [conda] cuda-cudart-static_linux-64 12.8.57 h3f2d84a_1 conda-forge 2025-05-07T19:48:56.1341408Z [conda] cuda-cudart_linux-64 12.8.57 h3f2d84a_1 conda-forge 2025-05-07T19:48:56.1341889Z [conda] cuda-cupti 12.8.57 hbd13f7d_0 conda-forge 2025-05-07T19:48:56.1342338Z [conda] cuda-cupti-dev 12.8.57 h5888daf_0 conda-forge 2025-05-07T19:48:56.1342819Z [conda] cuda-libraries 12.8.0 ha770c72_0 conda-forge 2025-05-07T19:48:56.1343293Z [conda] cuda-libraries-dev 12.8.0 ha770c72_0 conda-forge 2025-05-07T19:48:56.1344022Z [conda] cuda-nvrtc 12.8.61 hbd13f7d_0 conda-forge 2025-05-07T19:48:56.1344509Z [conda] cuda-nvrtc-dev 12.8.61 h5888daf_0 conda-forge 2025-05-07T19:48:56.1345005Z [conda] cuda-nvtx 12.8.55 hbd13f7d_0 conda-forge 2025-05-07T19:48:56.1345496Z [conda] cuda-opencl 12.8.55 hbd13f7d_0 conda-forge 2025-05-07T19:48:56.1345993Z [conda] cuda-opencl-dev 12.8.55 h5888daf_0 conda-forge 2025-05-07T19:48:56.1346515Z [conda] cuda-runtime 12.8.0 ha804496_0 conda-forge 2025-05-07T19:48:56.1346994Z [conda] libcublas 12.8.3.14 h9ab20c4_0 conda-forge 2025-05-07T19:48:56.1347492Z [conda] libcublas-dev 12.8.3.14 h9ab20c4_0 conda-forge 2025-05-07T19:48:56.1347988Z [conda] libcufft 11.3.3.41 hbd13f7d_0 conda-forge 2025-05-07T19:48:56.1348461Z [conda] libcufft-dev 11.3.3.41 h5888daf_0 conda-forge 2025-05-07T19:48:56.1348957Z [conda] libcurand 10.3.9.55 hbd13f7d_0 conda-forge 2025-05-07T19:48:56.1349547Z [conda] libcurand-dev 10.3.9.55 h5888daf_0 conda-forge 2025-05-07T19:48:56.1350140Z [conda] libcusolver 11.7.2.55 h9ab20c4_0 conda-forge 2025-05-07T19:48:56.1350603Z [conda] libcusolver-dev 11.7.2.55 h9ab20c4_0 conda-forge 2025-05-07T19:48:56.1351158Z [conda] libcusparse 12.5.7.53 hbd13f7d_0 conda-forge 2025-05-07T19:48:56.1351637Z [conda] libcusparse-dev 12.5.7.53 h5888daf_0 conda-forge 2025-05-07T19:48:56.1352103Z [conda] libnvjitlink 12.8.61 hbd13f7d_0 conda-forge 2025-05-07T19:48:56.1352589Z [conda] libnvjitlink-dev 12.8.61 h5888daf_0 conda-forge 2025-05-07T19:48:56.1353033Z [conda] numpy 2.2.5 py312h72c5963_0 conda-forge 2025-05-07T19:48:56.1353489Z [conda] nvidia-cublas-cu12 12.8.3.14 pypi_0 pypi 2025-05-07T19:48:56.1353968Z [conda] nvidia-cuda-cupti-cu12 12.8.57 pypi_0 pypi 2025-05-07T19:48:56.1354460Z [conda] nvidia-cuda-nvrtc-cu12 12.8.61 pypi_0 pypi 2025-05-07T19:48:56.1354955Z [conda] nvidia-cuda-runtime-cu12 12.8.57 pypi_0 pypi 2025-05-07T19:48:56.1355485Z [conda] nvidia-cudnn-cu12 9.8.0.87 pypi_0 pypi 2025-05-07T19:48:56.1355959Z [conda] nvidia-cufft-cu12 11.3.3.41 pypi_0 pypi 2025-05-07T19:48:56.1356411Z [conda] nvidia-curand-cu12 10.3.9.55 pypi_0 pypi 2025-05-07T19:48:56.1356892Z [conda] nvidia-cusolver-cu12 11.7.2.55 pypi_0 pypi 2025-05-07T19:48:56.1357377Z [conda] nvidia-cusparse-cu12 12.5.7.53 pypi_0 pypi 2025-05-07T19:48:56.1357857Z [conda] nvidia-cusparselt-cu12 0.6.3 pypi_0 pypi 2025-05-07T19:48:56.1358339Z [conda] nvidia-nccl-cu12 2.26.2 pypi_0 pypi 2025-05-07T19:48:56.1358801Z [conda] nvidia-nvjitlink-cu12 12.8.61 pypi_0 pypi 2025-05-07T19:48:56.1359277Z [conda] nvidia-nvtx-cu12 12.8.55 pypi_0 pypi 2025-05-07T19:48:56.1359737Z [conda] pytorch-triton 3.3.0+git96316ce5 pypi_0 pypi 2025-05-07T19:48:56.1360201Z [conda] torch 2.8.0.dev20250507+cu128 pypi_0 pypi 2025-05-07T19:48:56.1360470Z 2025-05-07T19:48:56.2021520Z ##[group]Run . $PRELUDE; install_cudnn $BUILD_ENV "$(pwd)/build_only/cudnn" 12.8.0 2025-05-07T19:48:56.2022287Z . $PRELUDE; install_cudnn $BUILD_ENV "$(pwd)/build_only/cudnn" 12.8.0 2025-05-07T19:48:56.2022881Z shell: bash --noprofile --norc -e -o pipefail {0} 2025-05-07T19:48:56.2023247Z env: 2025-05-07T19:48:56.2023617Z PRELUDE: .github/scripts/setup_env.bash 2025-05-07T19:48:56.2024115Z BUILD_ENV: build_binary 2025-05-07T19:48:56.2024443Z BUILD_TARGET: genai 2025-05-07T19:48:56.2024702Z BUILD_VARIANT: cuda 2025-05-07T19:48:56.2024997Z BUILD_CUDA_VERSION: 12.8.0 2025-05-07T19:48:56.2025272Z ##[endgroup] 2025-05-07T19:48:56.6591157Z ################################################################################ 2025-05-07T19:48:56.6591525Z # Install cuDNN 2025-05-07T19:48:56.6591749Z # 2025-05-07T19:48:56.6607974Z # [2025-05-07T19:48:56.660Z] + install_cudnn build_binary /__w/FBGEMM/FBGEMM/build_only/cudnn 12.8.0 2025-05-07T19:48:56.6608620Z ################################################################################ 2025-05-07T19:48:56.6609127Z 2025-05-07T19:48:56.6626771Z [EXEC] [ATTEMPT 0/3] + wget -q --timeout 1 pypi.org -O /dev/null 2025-05-07T19:48:56.7472721Z [CHECK] Network does not appear to be blocked. 2025-05-07T19:48:56.7473243Z [INSTALL] cuda_concat_version is determined to be: 128 2025-05-07T19:48:56.7474067Z [INSTALL] Could not find cuDNN URL for the given cuda_concat_version 128; defaulting to cuDNN for CUDA 11.8 2025-05-07T19:48:56.7474631Z + rm -rf /__w/FBGEMM/FBGEMM/build_only/cudnn 2025-05-07T19:48:56.7474872Z 2025-05-07T19:48:56.7488852Z 2025-05-07T19:48:56.7489141Z + mkdir -p /__w/FBGEMM/FBGEMM/build_only/cudnn 2025-05-07T19:48:56.7489423Z 2025-05-07T19:48:56.7505186Z 2025-05-07T19:48:56.7526540Z [INSTALL] Downloading cuDNN to /tmp/tmp.7lJ3DTpPiE ... 2025-05-07T19:48:56.7546559Z [EXEC] [ATTEMPT 0/3] + wget -q https://developer.download.nvidia.com/compute/redist/cudnn/v8.7.0/local_installers/11.8/cudnn-linux-x86_64-8.7.0.84_cuda11-archive.tar.xz -O cudnn.tar.xz 2025-05-07T19:48:58.7788346Z [INSTALL] Unpacking cuDNN ... 2025-05-07T19:48:58.7789218Z + tar -xvf cudnn.tar.xz 2025-05-07T19:48:58.7789700Z 2025-05-07T19:48:58.7824894Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/ 2025-05-07T19:48:58.7826003Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/ 2025-05-07T19:48:58.7827305Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_adv_infer_static.a 2025-05-07T19:49:01.2287039Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_adv_infer_static_v8.a 2025-05-07T19:49:01.2288742Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_adv_train_static.a 2025-05-07T19:49:03.5402733Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_adv_train_static_v8.a 2025-05-07T19:49:03.5404420Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_cnn_infer_static.a 2025-05-07T19:49:11.9534714Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_cnn_infer_static_v8.a 2025-05-07T19:49:11.9535825Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_cnn_train_static.a 2025-05-07T19:49:13.5867293Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_cnn_train_static_v8.a 2025-05-07T19:49:13.5868993Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_ops_infer_static.a 2025-05-07T19:49:15.3183686Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_ops_infer_static_v8.a 2025-05-07T19:49:15.3184932Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_ops_train_static.a 2025-05-07T19:49:16.8496434Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_ops_train_static_v8.a 2025-05-07T19:49:16.8497993Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn.so.8 2025-05-07T19:49:16.8499340Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn.so 2025-05-07T19:49:16.8500661Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn.so.8.7.0 2025-05-07T19:49:16.8505557Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_adv_infer.so.8 2025-05-07T19:49:16.8506476Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_adv_infer.so 2025-05-07T19:49:16.8507038Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_adv_infer.so.8.7.0 2025-05-07T19:49:19.2254599Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_adv_train.so.8 2025-05-07T19:49:19.2256227Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_adv_train.so 2025-05-07T19:49:19.2257752Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_adv_train.so.8.7.0 2025-05-07T19:49:21.4807577Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_cnn_infer.so 2025-05-07T19:49:21.4809562Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_cnn_infer.so.8 2025-05-07T19:49:21.4811141Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_cnn_infer.so.8.7.0 2025-05-07T19:49:29.9968917Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_cnn_train.so 2025-05-07T19:49:29.9970524Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_cnn_train.so.8.7.0 2025-05-07T19:49:31.6206450Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_cnn_train.so.8 2025-05-07T19:49:31.6208137Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_ops_infer.so.8.7.0 2025-05-07T19:49:33.3572167Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_ops_infer.so 2025-05-07T19:49:34.9069142Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_ops_infer.so.8 2025-05-07T19:49:34.9070736Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_ops_train.so.8.7.0 2025-05-07T19:49:34.9072282Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_ops_train.so 2025-05-07T19:49:34.9073775Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib/libcudnn_ops_train.so.8 2025-05-07T19:49:34.9074500Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/include/ 2025-05-07T19:49:34.9074961Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/include/cudnn_v8.h 2025-05-07T19:49:34.9075455Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/include/cudnn_adv_infer_v8.h 2025-05-07T19:49:34.9075996Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/include/cudnn_adv_train_v8.h 2025-05-07T19:49:34.9076838Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/include/cudnn_backend_v8.h 2025-05-07T19:49:34.9077376Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/include/cudnn_cnn_infer_v8.h 2025-05-07T19:49:34.9077891Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/include/cudnn_cnn_train_v8.h 2025-05-07T19:49:34.9078427Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/include/cudnn_ops_infer_v8.h 2025-05-07T19:49:34.9078984Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/include/cudnn_ops_train_v8.h 2025-05-07T19:49:34.9079499Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/include/cudnn_version_v8.h 2025-05-07T19:49:34.9080003Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/include/cudnn.h 2025-05-07T19:49:34.9080476Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/include/cudnn_adv_infer.h 2025-05-07T19:49:34.9081000Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/include/cudnn_adv_train.h 2025-05-07T19:49:34.9081499Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/include/cudnn_backend.h 2025-05-07T19:49:34.9082025Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/include/cudnn_cnn_infer.h 2025-05-07T19:49:34.9082542Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/include/cudnn_cnn_train.h 2025-05-07T19:49:34.9083041Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/include/cudnn_ops_infer.h 2025-05-07T19:49:34.9083550Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/include/cudnn_ops_train.h 2025-05-07T19:49:34.9084043Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/include/cudnn_version.h 2025-05-07T19:49:34.9084496Z cudnn-linux-x86_64-8.7.0.84_cuda11-archive/LICENSE 2025-05-07T19:49:34.9099487Z 2025-05-07T19:49:34.9100002Z [INSTALL] Moving cuDNN files to /__w/FBGEMM/FBGEMM/build_only/cudnn ... 2025-05-07T19:49:34.9100502Z + rm -rf /__w/FBGEMM/FBGEMM/build_only/cudnn/include 2025-05-07T19:49:34.9100773Z 2025-05-07T19:49:34.9113666Z 2025-05-07T19:49:34.9113856Z + rm -rf /__w/FBGEMM/FBGEMM/build_only/cudnn/lib 2025-05-07T19:49:34.9114104Z 2025-05-07T19:49:34.9135932Z 2025-05-07T19:49:34.9136484Z + mv cudnn-linux-x86_64-8.7.0.84_cuda11-archive/include /__w/FBGEMM/FBGEMM/build_only/cudnn 2025-05-07T19:49:34.9137183Z 2025-05-07T19:49:34.9172397Z 2025-05-07T19:49:34.9173436Z + mv cudnn-linux-x86_64-8.7.0.84_cuda11-archive/lib /__w/FBGEMM/FBGEMM/build_only/cudnn 2025-05-07T19:49:34.9174587Z 2025-05-07T19:49:36.4378167Z 2025-05-07T19:49:36.4378627Z /__w/FBGEMM/FBGEMM 2025-05-07T19:49:36.4378972Z + rm -rf /tmp/tmp.7lJ3DTpPiE 2025-05-07T19:49:36.4379173Z 2025-05-07T19:49:36.5565343Z 2025-05-07T19:49:36.5569701Z [INSTALL] Set environment variables CUDNN_INCLUDE_DIR and CUDNN_LIBRARY ... 2025-05-07T19:49:36.5570650Z + conda env config vars set -n build_binary CUDNN_INCLUDE_DIR=/__w/FBGEMM/FBGEMM/build_only/cudnn/include CUDNN_LIBRARY=/__w/FBGEMM/FBGEMM/build_only/cudnn/lib 2025-05-07T19:49:36.5571299Z 2025-05-07T19:49:36.9644794Z 2025-05-07T19:49:36.9645378Z [INSTALL] Successfully installed cuDNN (for CUDA 12.8.0) 2025-05-07T19:49:36.9706424Z ##[group]Run . $PRELUDE; cd fbgemm_gpu; prepare_fbgemm_gpu_build $BUILD_ENV 2025-05-07T19:49:36.9707042Z . $PRELUDE; cd fbgemm_gpu; prepare_fbgemm_gpu_build $BUILD_ENV 2025-05-07T19:49:36.9707625Z shell: bash --noprofile --norc -e -o pipefail {0} 2025-05-07T19:49:36.9707962Z env: 2025-05-07T19:49:36.9708188Z PRELUDE: .github/scripts/setup_env.bash 2025-05-07T19:49:36.9708510Z BUILD_ENV: build_binary 2025-05-07T19:49:36.9709000Z BUILD_TARGET: genai 2025-05-07T19:49:36.9709250Z BUILD_VARIANT: cuda 2025-05-07T19:49:36.9709500Z BUILD_CUDA_VERSION: 12.8.0 2025-05-07T19:49:36.9709921Z ##[endgroup] 2025-05-07T19:49:37.4293532Z ################################################################################ 2025-05-07T19:49:37.4294485Z # Prepare FBGEMM-GPU Build 2025-05-07T19:49:37.4294843Z # 2025-05-07T19:49:37.4307910Z # [2025-05-07T19:49:37.430Z] + prepare_fbgemm_gpu_build build_binary 2025-05-07T19:49:37.4308443Z ################################################################################ 2025-05-07T19:49:37.4308860Z 2025-05-07T19:49:37.4331032Z [EXEC] [ATTEMPT 0/3] + wget -q --timeout 1 pypi.org -O /dev/null 2025-05-07T19:49:37.5160164Z [CHECK] Network does not appear to be blocked. 2025-05-07T19:49:37.5178706Z [BUILD] Running git submodules update ... 2025-05-07T19:49:37.5203604Z [EXEC] [ATTEMPT 0/3] + git submodule sync 2025-05-07T19:49:37.5556219Z Synchronizing submodule url for '../external/asmjit' 2025-05-07T19:49:37.5556733Z Synchronizing submodule url for '../external/composable_kernel' 2025-05-07T19:49:37.5557232Z Synchronizing submodule url for '../external/cpuinfo' 2025-05-07T19:49:37.5557652Z Synchronizing submodule url for '../external/cutlass' 2025-05-07T19:49:37.5558092Z Synchronizing submodule url for '../external/googletest' 2025-05-07T19:49:37.5558561Z Synchronizing submodule url for '../external/hipify_torch' 2025-05-07T19:49:37.5558997Z Synchronizing submodule url for '../external/json' 2025-05-07T19:49:37.5586390Z [EXEC] [ATTEMPT 0/3] + git submodule update --init --recursive 2025-05-07T19:49:37.6059851Z [BUILD] Installing other build dependencies ... 2025-05-07T19:49:37.6081099Z [EXEC] [ATTEMPT 0/3] + conda run --no-capture-output -n build_binary python -m pip install -r requirements.txt 2025-05-07T19:49:39.7047856Z Collecting backports.tarfile (from -r requirements.txt (line 13)) 2025-05-07T19:49:39.7218059Z Downloading backports.tarfile-1.2.0-py3-none-any.whl.metadata (2.0 kB) 2025-05-07T19:49:39.7302751Z Requirement already satisfied: build in /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages (from -r requirements.txt (line 14)) (1.2.2.post1) 2025-05-07T19:49:39.8338449Z Collecting cmake (from -r requirements.txt (line 15)) 2025-05-07T19:49:39.8369415Z Downloading cmake-4.0.0-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.3 kB) 2025-05-07T19:49:39.8443763Z Requirement already satisfied: click in /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages (from -r requirements.txt (line 16)) (8.1.8) 2025-05-07T19:49:39.8445699Z Requirement already satisfied: hypothesis in /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages (from -r requirements.txt (line 17)) (6.131.14) 2025-05-07T19:49:39.8447141Z Requirement already satisfied: jinja2 in /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages (from -r requirements.txt (line 18)) (3.1.6) 2025-05-07T19:49:39.8448586Z Requirement already satisfied: mpmath==1.3.0 in /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages (from -r requirements.txt (line 19)) (1.3.0) 2025-05-07T19:49:39.8830681Z Collecting ninja (from -r requirements.txt (line 20)) 2025-05-07T19:49:39.8859883Z Downloading ninja-1.11.1.4-py3-none-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.metadata (5.0 kB) 2025-05-07T19:49:39.8929331Z Requirement already satisfied: numpy>=2.0.2 in /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages (from -r requirements.txt (line 21)) (2.2.5) 2025-05-07T19:49:39.9066364Z Collecting pyre-extensions (from -r requirements.txt (line 22)) 2025-05-07T19:49:39.9097822Z Downloading pyre_extensions-0.0.32-py3-none-any.whl.metadata (4.0 kB) 2025-05-07T19:49:39.9167481Z Requirement already satisfied: pyyaml in /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages (from -r requirements.txt (line 23)) (6.0.2) 2025-05-07T19:49:39.9171425Z Requirement already satisfied: scikit-build in /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages (from -r requirements.txt (line 24)) (0.18.1) 2025-05-07T19:49:39.9175431Z Requirement already satisfied: setuptools in /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages (from -r requirements.txt (line 25)) (78.1.1) 2025-05-07T19:49:39.9369518Z Collecting setuptools_git_versioning (from -r requirements.txt (line 26)) 2025-05-07T19:49:39.9394584Z Downloading setuptools_git_versioning-2.1.0-py3-none-any.whl.metadata (6.1 kB) 2025-05-07T19:49:39.9581911Z Collecting tabulate (from -r requirements.txt (line 27)) 2025-05-07T19:49:39.9609824Z Downloading tabulate-0.9.0-py3-none-any.whl.metadata (34 kB) 2025-05-07T19:49:39.9857264Z Collecting patchelf (from -r requirements.txt (line 28)) 2025-05-07T19:49:39.9885280Z Downloading patchelf-0.17.2.2-py3-none-manylinux1_x86_64.manylinux_2_5_x86_64.musllinux_1_1_x86_64.whl.metadata (3.5 kB) 2025-05-07T19:49:39.9981898Z Requirement already satisfied: packaging>=19.1 in /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages (from build->-r requirements.txt (line 14)) (25.0) 2025-05-07T19:49:39.9986232Z Requirement already satisfied: pyproject_hooks in /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages (from build->-r requirements.txt (line 14)) (1.2.0) 2025-05-07T19:49:40.0029641Z Requirement already satisfied: attrs>=22.2.0 in /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages (from hypothesis->-r requirements.txt (line 17)) (25.3.0) 2025-05-07T19:49:40.0040051Z Requirement already satisfied: sortedcontainers<3.0.0,>=2.1.0 in /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages (from hypothesis->-r requirements.txt (line 17)) (2.4.0) 2025-05-07T19:49:40.0089620Z Requirement already satisfied: MarkupSafe>=2.0 in /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages (from jinja2->-r requirements.txt (line 18)) (3.0.2) 2025-05-07T19:49:40.0212677Z Collecting typing-inspect (from pyre-extensions->-r requirements.txt (line 22)) 2025-05-07T19:49:40.0258791Z Downloading typing_inspect-0.9.0-py3-none-any.whl.metadata (1.5 kB) 2025-05-07T19:49:40.0330503Z Requirement already satisfied: typing-extensions in /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages (from pyre-extensions->-r requirements.txt (line 22)) (4.13.2) 2025-05-07T19:49:40.0345760Z Requirement already satisfied: distro in /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages (from scikit-build->-r requirements.txt (line 24)) (1.9.0) 2025-05-07T19:49:40.0353123Z Requirement already satisfied: wheel>=0.32.0 in /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages (from scikit-build->-r requirements.txt (line 24)) (0.45.1) 2025-05-07T19:49:40.0625309Z Collecting mypy-extensions>=0.3.0 (from typing-inspect->pyre-extensions->-r requirements.txt (line 22)) 2025-05-07T19:49:40.0652739Z Downloading mypy_extensions-1.1.0-py3-none-any.whl.metadata (1.1 kB) 2025-05-07T19:49:40.0759659Z Downloading backports.tarfile-1.2.0-py3-none-any.whl (30 kB) 2025-05-07T19:49:40.0849438Z Downloading cmake-4.0.0-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (27.9 MB) 2025-05-07T19:49:40.2025311Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 27.9/27.9 MB 241.8 MB/s eta 0:00:00 2025-05-07T19:49:40.2058733Z Downloading ninja-1.11.1.4-py3-none-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (422 kB) 2025-05-07T19:49:40.2139084Z Downloading pyre_extensions-0.0.32-py3-none-any.whl (12 kB) 2025-05-07T19:49:40.2200794Z Downloading setuptools_git_versioning-2.1.0-py3-none-any.whl (10 kB) 2025-05-07T19:49:40.2251495Z Downloading tabulate-0.9.0-py3-none-any.whl (35 kB) 2025-05-07T19:49:40.2307516Z Downloading patchelf-0.17.2.2-py3-none-manylinux1_x86_64.manylinux_2_5_x86_64.musllinux_1_1_x86_64.whl (466 kB) 2025-05-07T19:49:40.2388294Z Downloading typing_inspect-0.9.0-py3-none-any.whl (8.8 kB) 2025-05-07T19:49:40.2464203Z Downloading mypy_extensions-1.1.0-py3-none-any.whl (5.0 kB) 2025-05-07T19:49:40.4025799Z Installing collected packages: tabulate, setuptools_git_versioning, patchelf, ninja, mypy-extensions, cmake, backports.tarfile, typing-inspect, pyre-extensions 2025-05-07T19:49:41.2675273Z 2025-05-07T19:49:41.2720807Z Successfully installed backports.tarfile-1.2.0 cmake-4.0.0 mypy-extensions-1.1.0 ninja-1.11.1.4 patchelf-0.17.2.2 pyre-extensions-0.0.32 setuptools_git_versioning-2.1.0 tabulate-0.9.0 typing-inspect-0.9.0 2025-05-07T19:49:41.2724272Z WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager, possibly rendering your system unusable. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv. Use the --root-user-action option if you know what you are doing and want to suppress this warning. 2025-05-07T19:49:41.4186832Z ################################################################################ 2025-05-07T19:49:41.4187287Z # Install PyTorch (PyTorch PIP) 2025-05-07T19:49:41.4187581Z # 2025-05-07T19:49:41.4215899Z # [2025-05-07T19:49:41.420Z] + install_triton_pip build_binary 2025-05-07T19:49:41.4216567Z ################################################################################ 2025-05-07T19:49:41.4216851Z 2025-05-07T19:49:41.4217090Z [BUILD] Installing pytorch-triton nightly/3.2.0+git4b3bb1f8 from PIP ... 2025-05-07T19:49:41.4217554Z ################################################################################ 2025-05-07T19:49:41.4217941Z # Install Package From PyTorch PIP: pytorch-triton 2025-05-07T19:49:41.4218272Z # 2025-05-07T19:49:41.4241253Z # [2025-05-07T19:49:41.423Z] + install_from_pytorch_pip build_binary pytorch-triton nightly/3.2.0+git4b3bb1f8 2025-05-07T19:49:41.4241844Z ################################################################################ 2025-05-07T19:49:41.4242108Z 2025-05-07T19:49:41.4259283Z [EXEC] [ATTEMPT 0/3] + wget -q --timeout 1 pypi.org -O /dev/null 2025-05-07T19:49:41.5081773Z [CHECK] Network does not appear to be blocked. 2025-05-07T19:49:41.5082861Z ################################################################################ 2025-05-07T19:49:41.5083851Z # Prepare PIP Arguments (PyTorch PIP) 2025-05-07T19:49:41.5084676Z # 2025-05-07T19:49:41.5103982Z # [2025-05-07T19:49:41.509Z] + __prepare_pip_arguments pytorch-triton nightly/3.2.0+git4b3bb1f8 2025-05-07T19:49:41.5105585Z ################################################################################ 2025-05-07T19:49:41.5105820Z 2025-05-07T19:49:41.5150290Z [INSTALL] Extracted package (channel, version): (nightly, 3.2.0+git4b3bb1f8) 2025-05-07T19:49:41.5166269Z [INSTALL] Using a non-RELEASE channel: nightly ... 2025-05-07T19:49:41.5166943Z [INSTALL] Extracted the full PIP channel: https://download.pytorch.org/whl/nightly/ 2025-05-07T19:49:41.5170221Z [INSTALL] Extracted the full PIP package: --pre pytorch-triton==3.2.0+git4b3bb1f8 2025-05-07T19:49:41.5178012Z [INSTALL] Attempting to install [pytorch-triton, 3.2.0+git4b3bb1f8] from PyTorch PIP using channel https://download.pytorch.org/whl/nightly/ ... 2025-05-07T19:49:41.5203131Z [EXEC] [ATTEMPT 0/3] + conda run -n build_binary pip install --pre pytorch-triton==3.2.0+git4b3bb1f8 --index-url https://download.pytorch.org/whl/nightly/ 2025-05-07T19:49:46.8472825Z ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. 2025-05-07T19:49:46.8475556Z Looking in indexes: https://download.pytorch.org/whl/nightly/ 2025-05-07T19:49:46.8477765Z torch 2.8.0.dev20250507+cu128 requires pytorch-triton==3.3.0+git96316ce5; platform_system == "Linux", but you have pytorch-triton 3.2.0+git4b3bb1f8 which is incompatible. 2025-05-07T19:49:46.8479652Z WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager, possibly rendering your system unusable. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv. Use the --root-user-action option if you know what you are doing and want to suppress this warning. 2025-05-07T19:49:46.8481005Z 2025-05-07T19:49:46.8481131Z Collecting pytorch-triton==3.2.0+git4b3bb1f8 2025-05-07T19:49:46.8481944Z Downloading https://download.pytorch.org/whl/nightly/pytorch_triton-3.2.0%2Bgit4b3bb1f8-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (1.3 kB) 2025-05-07T19:49:46.8483167Z Downloading https://download.pytorch.org/whl/nightly/pytorch_triton-3.2.0%2Bgit4b3bb1f8-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (166.5 MB) 2025-05-07T19:49:46.8484346Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 166.5/166.5 MB 210.2 MB/s eta 0:00:00 2025-05-07T19:49:46.8484724Z Installing collected packages: pytorch-triton 2025-05-07T19:49:46.8485246Z Attempting uninstall: pytorch-triton 2025-05-07T19:49:46.8485621Z Found existing installation: pytorch-triton 3.3.0+git96316ce5 2025-05-07T19:49:46.8486055Z Uninstalling pytorch-triton-3.3.0+git96316ce5: 2025-05-07T19:49:46.8486483Z Successfully uninstalled pytorch-triton-3.3.0+git96316ce5 2025-05-07T19:49:46.8486908Z Successfully installed pytorch-triton-3.2.0+git4b3bb1f8 2025-05-07T19:49:46.8487162Z 2025-05-07T19:49:48.9451803Z [CHECK] Python (sub-)package 'triton' found ... 2025-05-07T19:49:48.9452751Z [CHECK] Printing out the pytorch-triton version ... 2025-05-07T19:49:50.9894893Z ################################################################################ 2025-05-07T19:49:50.9896214Z [CHECK] The installed VERSION of pytorch-triton is: 3.2.0 2025-05-07T19:49:50.9897006Z ################################################################################ 2025-05-07T19:49:50.9897262Z 2025-05-07T19:49:52.9355756Z [CHECK] Python (sub-)package 'numpy' found ... 2025-05-07T19:49:55.0024776Z [CHECK] Python (sub-)package 'skbuild' found ... 2025-05-07T19:49:55.0025269Z [BUILD] Successfully ran git submodules update 2025-05-07T19:49:55.0100060Z ##[group]Run . $PRELUDE; cd fbgemm_gpu; build_fbgemm_gpu_package $BUILD_ENV nightly genai/cuda 2025-05-07T19:49:55.0100777Z . $PRELUDE; cd fbgemm_gpu; build_fbgemm_gpu_package $BUILD_ENV nightly genai/cuda 2025-05-07T19:49:55.0101385Z shell: bash --noprofile --norc -e -o pipefail {0} 2025-05-07T19:49:55.0101716Z env: 2025-05-07T19:49:55.0101937Z PRELUDE: .github/scripts/setup_env.bash 2025-05-07T19:49:55.0102252Z BUILD_ENV: build_binary 2025-05-07T19:49:55.0102494Z BUILD_TARGET: genai 2025-05-07T19:49:55.0102731Z BUILD_VARIANT: cuda 2025-05-07T19:49:55.0102981Z BUILD_CUDA_VERSION: 12.8.0 2025-05-07T19:49:55.0103229Z ##[endgroup] 2025-05-07T19:49:55.4667910Z [BUILD] BUILD_TARGET_VARIANT: genai/cuda 2025-05-07T19:49:55.4668919Z [BUILD] Extracted build target: genai 2025-05-07T19:49:55.4669832Z [BUILD] Extracted build variant: cuda 2025-05-07T19:49:57.2717283Z /github/home/miniconda/envs/build_binary/bin/cc 2025-05-07T19:49:57.2717661Z 2025-05-07T19:49:57.3486308Z [CHECK] Binary cc found in PATH 2025-05-07T19:49:59.1792254Z /github/home/miniconda/envs/build_binary/bin/gcc 2025-05-07T19:49:59.1792611Z 2025-05-07T19:49:59.2594925Z [CHECK] Binary gcc found in PATH 2025-05-07T19:50:01.0753805Z /github/home/miniconda/envs/build_binary/bin/c++ 2025-05-07T19:50:01.0754159Z 2025-05-07T19:50:01.1516001Z [CHECK] Binary c++ found in PATH 2025-05-07T19:50:02.9766469Z /github/home/miniconda/envs/build_binary/bin/g++ 2025-05-07T19:50:02.9767309Z 2025-05-07T19:50:03.0444707Z [CHECK] Binary g++ found in PATH 2025-05-07T19:50:04.9288283Z [BUILD] Extracted and set Python tag: py312 2025-05-07T19:50:04.9288823Z [BUILD] Extracted and set Python platform name: manylinux_2_28_x86_64 2025-05-07T19:50:04.9526163Z core = 24 2025-05-07T19:50:04.9776053Z sockets = 2 2025-05-07T19:50:04.9776464Z [BUILD] Set multicore run option for setup.py: -j 48 2025-05-07T19:50:04.9776926Z [CHECK] LD_LIBRARY_PATH = 2025-05-07T19:50:04.9777221Z [BUILD] Running pre-build cleanups ... 2025-05-07T19:50:04.9777742Z + rm -rf dist 2025-05-07T19:50:04.9777884Z 2025-05-07T19:50:04.9788105Z 2025-05-07T19:50:04.9788381Z + conda run --no-capture-output -n build_binary python setup.py clean 2025-05-07T19:50:04.9788745Z 2025-05-07T19:50:08.1458420Z INFO:root:running clean 2025-05-07T19:50:08.1458796Z [SETUP.PY] ARGV: ['setup.py', 'clean'] 2025-05-07T19:50:08.1459887Z [SETUP.PY] Parsed setup.py arguments: Namespace(verbose=False, debug=False, dryrun=False, build_target='default', build_variant='cuda', package_channel='nightly', nvml_lib_path=None, nccl_lib_path=None, use_fb_only=False, cxxprefix=None) 2025-05-07T19:50:08.1460954Z [SETUP.PY] Other arguments: ['clean'] 2025-05-07T19:50:08.1461449Z [SETUP.PY] CUDA CUB directory environment variable not set. Using default CUB location. 2025-05-07T19:50:08.1462051Z [SETUP.PY] Using CUDA = /github/home/miniconda/envs/build_binary 2025-05-07T19:50:08.1462930Z [SETUP.PY] Generating version file at: /__w/FBGEMM/FBGEMM/fbgemm_gpu/fbgemm_gpu/docs/version.py 2025-05-07T19:50:08.1463620Z [SETUP.PY] Setting the FBGEMM build target: default ... 2025-05-07T19:50:08.1464026Z [SETUP.PY] Setting the FBGEMM build variant: cuda ... 2025-05-07T19:50:08.1465282Z [SETUP.PY] Passing CMake arguments: ['-DCMAKE_PREFIX_PATH=/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch', '-D_GLIBCXX_USE_CXX11_ABI=1', '-DFBGEMM_BUILD_TARGET=default', '-DFBGEMM_BUILD_VARIANT=cuda', "-DCMAKE_C_FLAGS=''", "-DCMAKE_CXX_FLAGS=''"] 2025-05-07T19:50:08.4580782Z 2025-05-07T19:50:08.4581336Z [BUILD] Printing git status ... 2025-05-07T19:50:08.4581701Z + git status 2025-05-07T19:50:08.4581840Z 2025-05-07T19:50:09.1822327Z HEAD detached at pull/4066/merge 2025-05-07T19:50:09.1823240Z Untracked files: 2025-05-07T19:50:09.1824398Z (use "git add ..." to include in what will be committed) 2025-05-07T19:50:09.1825439Z ../build_only/ 2025-05-07T19:50:09.1826107Z ../collect_env.py 2025-05-07T19:50:09.1826768Z fbgemm_gpu/docs/version.py 2025-05-07T19:50:09.1827259Z 2025-05-07T19:50:09.1827953Z nothing added to commit but untracked files present (use "git add" to track) 2025-05-07T19:50:09.1828324Z 2025-05-07T19:50:09.1828429Z + git diff 2025-05-07T19:50:09.1828553Z 2025-05-07T19:50:09.2117118Z 2025-05-07T19:50:09.2117401Z ################################################################################ 2025-05-07T19:50:09.2117810Z # Configure FBGEMM-GPU Build 2025-05-07T19:50:09.2118079Z # 2025-05-07T19:50:09.2144740Z # [2025-05-07T19:50:09.213Z] + __configure_fbgemm_gpu_build 2025-05-07T19:50:09.2145182Z ################################################################################ 2025-05-07T19:50:09.2145421Z 2025-05-07T19:50:09.2150233Z [BUILD] Setting the build target: genai ... 2025-05-07T19:50:09.2150708Z [BUILD] Configuring build as CUDA variant (this is the default behavior) ... 2025-05-07T19:50:11.0176266Z /github/home/miniconda/envs/build_binary/bin/nvcc 2025-05-07T19:50:11.0176625Z 2025-05-07T19:50:11.0732538Z [CHECK] Binary nvcc found in PATH 2025-05-07T19:50:12.9259626Z /__w/FBGEMM/FBGEMM/build_only/cudnn/include 2025-05-07T19:50:12.9259960Z 2025-05-07T19:50:13.0018922Z [CHECK] Environment variable CUDNN_INCLUDE_DIR is defined in the Conda environment 2025-05-07T19:50:14.8000933Z /__w/FBGEMM/FBGEMM/build_only/cudnn/lib 2025-05-07T19:50:14.8001217Z 2025-05-07T19:50:14.8574032Z [CHECK] Environment variable CUDNN_LIBRARY is defined in the Conda environment 2025-05-07T19:50:16.6575139Z /github/home/miniconda/envs/build_binary/lib/stubs/libnvidia-ml.so 2025-05-07T19:50:16.6575539Z 2025-05-07T19:50:16.7145634Z [CHECK] Environment variable NVML_LIB_PATH is defined in the Conda environment 2025-05-07T19:50:18.5626757Z [BUILD] Using the default architectures for CUDA nvcc: NVIDIA (R) Cuda compiler driver 2025-05-07T19:50:18.5627336Z Copyright (c) 2005-2025 NVIDIA Corporation 2025-05-07T19:50:18.5627685Z Built on Wed_Jan_15_19:20:09_PST_2025 2025-05-07T19:50:18.5628043Z Cuda compilation tools, release 12.8, V12.8.61 2025-05-07T19:50:18.5628435Z Build cuda_12.8.r12.8/compiler.35404655_0 ... 2025-05-07T19:50:18.5628865Z [BUILD] Setting the following CUDA targets: 7.0;8.0;9.0;9.0a;10.0a;12.0a 2025-05-07T19:50:18.5629298Z [BUILD] Looking up NVML filepath ... 2025-05-07T19:50:20.4088900Z [BUILD] Looking up NCCL filepath ... 2025-05-07T19:50:24.2587205Z [BUILD] Setting NVCC verbose mode ... 2025-05-07T19:50:24.2587675Z + conda env config vars set -n build_binary NVCC_VERBOSE=1 2025-05-07T19:50:24.2587960Z 2025-05-07T19:50:24.6945278Z 2025-05-07T19:50:24.6945843Z [BUILD] Setting CUDA build args ... 2025-05-07T19:50:26.6267871Z [BUILD] Looking up CUDA version ... 2025-05-07T19:50:30.4628572Z + conda run -n build_binary c++ --version | grep -i clang 2025-05-07T19:50:30.4628889Z 2025-05-07T19:50:32.3607670Z 2025-05-07T19:50:32.3608467Z [BUILD] Setting NVCC flags ... 2025-05-07T19:50:32.3609735Z + conda env config vars set -n build_binary NVCC_PREPEND_FLAGS="-std=c++20 -Xcompiler -std=c++20 -ccbin /github/home/miniconda/envs/build_binary/bin/c++ -allow-unsupported-compiler" 2025-05-07T19:50:32.3610955Z 2025-05-07T19:50:32.7777418Z 2025-05-07T19:50:32.7777831Z + conda run -n build_binary printenv NVCC_PREPEND_FLAGS 2025-05-07T19:50:32.7778145Z 2025-05-07T19:50:34.5906415Z -std=c++20 -Xcompiler -std=c++20 -ccbin /github/home/miniconda/envs/build_binary/bin/c++ -allow-unsupported-compiler 2025-05-07T19:50:34.5906958Z 2025-05-07T19:50:34.6671646Z 2025-05-07T19:50:34.6672161Z [BUILD] Setting CUDA build args ... 2025-05-07T19:50:34.6672584Z + conda run -n build_binary c++ --version 2025-05-07T19:50:34.6672806Z 2025-05-07T19:50:36.4432408Z c++ (conda-forge gcc 11.4.0-13) 11.4.0 2025-05-07T19:50:36.4432837Z Copyright (C) 2021 Free Software Foundation, Inc. 2025-05-07T19:50:36.4433318Z This is free software; see the source for copying conditions. There is NO 2025-05-07T19:50:36.4433889Z warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. 2025-05-07T19:50:36.4434299Z 2025-05-07T19:50:36.4434304Z 2025-05-07T19:50:36.5005237Z 2025-05-07T19:50:36.5006263Z + conda run -n build_binary c++ --version | grep -i clang 2025-05-07T19:50:36.5006597Z 2025-05-07T19:50:38.3549614Z 2025-05-07T19:50:38.3550092Z [BUILD] Enabling debug features in the build ... 2025-05-07T19:50:38.3550706Z .github/scripts/fbgemm_gpu_build.bash: line 370: [: : integer expression expected 2025-05-07T19:50:38.3553268Z [BUILD] FBGEMM_GPU build arguments have been set: --verbose --build-target=genai --build-variant=cuda --nvml_lib_path=/github/home/miniconda/envs/build_binary/lib/stubs/libnvidia-ml.so --nccl_lib_path=/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/lib/libnccl.so.2 -DTORCH_CUDA_ARCH_LIST='7.0;8.0;9.0;9.0a;10.0a;12.0a' -DCUDA_TOOLKIT_ROOT_DIR=/github/home/miniconda/envs/build_binary/targets/x86_64-linux -DCUDAToolkit_ROOT=/github/home/miniconda/envs/build_binary/targets/x86_64-linux -DCMAKE_CXX_STANDARD=20 --debug 2025-05-07T19:50:38.3555721Z ################################################################################ 2025-05-07T19:50:38.3556096Z # Build FBGEMM-GPU Package (Wheel) 2025-05-07T19:50:38.3556392Z # 2025-05-07T19:50:38.3574521Z # [2025-05-07T19:50:38.356Z] + build_fbgemm_gpu_package build_binary nightly genai/cuda 2025-05-07T19:50:38.3575406Z ################################################################################ 2025-05-07T19:50:38.3575701Z 2025-05-07T19:50:38.3575907Z [BUILD] Building FBGEMM wheel (TARGET=genai, VARIANT=cuda) ... 2025-05-07T19:50:38.3580506Z + conda run --no-capture-output -n build_binary python -m build --wheel --no-isolation --config-setting=--build-option=--verbose --config-setting=--build-option=--build-target=genai --config-setting=--build-option=--build-variant=cuda --config-setting=--build-option=--nvml_lib_path=/github/home/miniconda/envs/build_binary/lib/stubs/libnvidia-ml.so --config-setting=--build-option=--nccl_lib_path=/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/lib/libnccl.so.2 --config-setting=--build-option=-DTORCH_CUDA_ARCH_LIST='7.0;8.0;9.0;9.0a;10.0a;12.0a' --config-setting=--build-option=-DCUDA_TOOLKIT_ROOT_DIR=/github/home/miniconda/envs/build_binary/targets/x86_64-linux --config-setting=--build-option=-DCUDAToolkit_ROOT=/github/home/miniconda/envs/build_binary/targets/x86_64-linux --config-setting=--build-option=-DCMAKE_CXX_STANDARD=20 --config-setting=--build-option=--debug --config-setting=--build-option=--package_channel=nightly --config-setting=--build-option=--python-tag=py312 --config-setting=--build-option=--plat-name=manylinux_2_28_x86_64 2025-05-07T19:50:38.3585661Z 2025-05-07T19:50:40.2483101Z * Getting build dependencies for wheel... 2025-05-07T19:50:41.6525453Z INFO:root:running egg_info 2025-05-07T19:50:41.6564757Z INFO:root:creating fbgemm_gpu_nightly.egg-info 2025-05-07T19:50:41.6565216Z INFO:root:writing fbgemm_gpu_nightly.egg-info/PKG-INFO 2025-05-07T19:50:41.6566979Z INFO:root:writing dependency_links to fbgemm_gpu_nightly.egg-info/dependency_links.txt 2025-05-07T19:50:41.6568719Z INFO:root:writing requirements to fbgemm_gpu_nightly.egg-info/requires.txt 2025-05-07T19:50:41.6569551Z INFO:root:writing top-level names to fbgemm_gpu_nightly.egg-info/top_level.txt 2025-05-07T19:50:41.6570576Z INFO:root:writing manifest file 'fbgemm_gpu_nightly.egg-info/SOURCES.txt' 2025-05-07T19:50:41.6642223Z INFO:root:reading manifest file 'fbgemm_gpu_nightly.egg-info/SOURCES.txt' 2025-05-07T19:50:41.6651652Z INFO:root:writing manifest file 'fbgemm_gpu_nightly.egg-info/SOURCES.txt' 2025-05-07T19:50:41.6739446Z [SETUP.PY] ARGV: ['setup.py', 'egg_info'] 2025-05-07T19:50:41.6740662Z [SETUP.PY] Parsed setup.py arguments: Namespace(verbose=False, debug=False, dryrun=False, build_target='default', build_variant='cuda', package_channel='nightly', nvml_lib_path=None, nccl_lib_path=None, use_fb_only=False, cxxprefix=None) 2025-05-07T19:50:41.6741787Z [SETUP.PY] Other arguments: ['egg_info'] 2025-05-07T19:50:41.6742289Z [SETUP.PY] CUDA CUB directory environment variable not set. Using default CUB location. 2025-05-07T19:50:41.6743072Z [SETUP.PY] Using CUDA = /github/home/miniconda/envs/build_binary 2025-05-07T19:50:41.6743760Z [SETUP.PY] Generating version file at: /__w/FBGEMM/FBGEMM/fbgemm_gpu/fbgemm_gpu/docs/version.py 2025-05-07T19:50:41.6744332Z [SETUP.PY] Setting the FBGEMM build target: default ... 2025-05-07T19:50:41.6744730Z [SETUP.PY] Setting the FBGEMM build variant: cuda ... 2025-05-07T19:50:41.6745987Z [SETUP.PY] Passing CMake arguments: ['-DCMAKE_PREFIX_PATH=/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch', '-D_GLIBCXX_USE_CXX11_ABI=1', '-DFBGEMM_BUILD_TARGET=default', '-DFBGEMM_BUILD_VARIANT=cuda', "-DCMAKE_C_FLAGS=''", "-DCMAKE_CXX_FLAGS=''"] 2025-05-07T19:50:42.0151408Z * Building wheel... 2025-05-07T19:50:43.3967338Z [SETUP.PY] ARGV: ['setup.py', 'bdist_wheel', '--dist-dir', '/__w/FBGEMM/FBGEMM/fbgemm_gpu/dist/.tmp-9u1c9m79', '--verbose', '--build-target=genai', '--build-variant=cuda', '--nvml_lib_path=/github/home/miniconda/envs/build_binary/lib/stubs/libnvidia-ml.so', '--nccl_lib_path=/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/lib/libnccl.so.2', '-DTORCH_CUDA_ARCH_LIST=7.0;8.0;9.0;9.0a;10.0a;12.0a', '-DCUDA_TOOLKIT_ROOT_DIR=/github/home/miniconda/envs/build_binary/targets/x86_64-linux', '-DCUDAToolkit_ROOT=/github/home/miniconda/envs/build_binary/targets/x86_64-linux', '-DCMAKE_CXX_STANDARD=20', '--debug', '--package_channel=nightly', '--python-tag=py312', '--plat-name=manylinux_2_28_x86_64'] 2025-05-07T19:50:43.3971779Z [SETUP.PY] Parsed setup.py arguments: Namespace(verbose=True, debug=True, dryrun=False, build_target='genai', build_variant='cuda', package_channel='nightly', nvml_lib_path='/github/home/miniconda/envs/build_binary/lib/stubs/libnvidia-ml.so', nccl_lib_path='/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/lib/libnccl.so.2', use_fb_only=False, cxxprefix=None) 2025-05-07T19:50:43.3974890Z [SETUP.PY] Other arguments: ['bdist_wheel', '--dist-dir', '/__w/FBGEMM/FBGEMM/fbgemm_gpu/dist/.tmp-9u1c9m79', '-DTORCH_CUDA_ARCH_LIST=7.0;8.0;9.0;9.0a;10.0a;12.0a', '-DCUDA_TOOLKIT_ROOT_DIR=/github/home/miniconda/envs/build_binary/targets/x86_64-linux', '-DCUDAToolkit_ROOT=/github/home/miniconda/envs/build_binary/targets/x86_64-linux', '-DCMAKE_CXX_STANDARD=20', '--python-tag=py312', '--plat-name=manylinux_2_28_x86_64'] 2025-05-07T19:50:43.3976772Z [SETUP.PY] CUDA CUB directory environment variable not set. Using default CUB location. 2025-05-07T19:50:43.3977452Z [SETUP.PY] Using CUDA = /github/home/miniconda/envs/build_binary 2025-05-07T19:50:43.3978100Z [SETUP.PY] Generating version file at: /__w/FBGEMM/FBGEMM/fbgemm_gpu/fbgemm_gpu/docs/version.py 2025-05-07T19:50:43.3978621Z [SETUP.PY] Setting the FBGEMM build target: genai ... 2025-05-07T19:50:43.3978983Z [SETUP.PY] Setting the FBGEMM build variant: cuda ... 2025-05-07T19:50:43.3983572Z [SETUP.PY] Passing CMake arguments: ['-DCMAKE_PREFIX_PATH=/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch', '-D_GLIBCXX_USE_CXX11_ABI=1', '-DCMAKE_VERBOSE_MAKEFILE=ON', '-DCMAKE_EXPORT_COMPILE_COMMANDS=TRUE', '-DFBGEMM_BUILD_TARGET=genai', '-DFBGEMM_BUILD_VARIANT=cuda', '-DNVML_LIB_PATH=/github/home/miniconda/envs/build_binary/lib/stubs/libnvidia-ml.so', '-DNCCL_INCLUDE_DIRS=/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include', '-DNCCL_LIBRARIES=/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/lib/libnccl.so.2', "-DCMAKE_C_FLAGS='-DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/lib'", "-DCMAKE_CXX_FLAGS='-DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/lib'", '-DTORCH_CUDA_ARCH_LIST=7.0;8.0;9.0;9.0a;10.0a;12.0a', '-DCUDA_TOOLKIT_ROOT_DIR=/github/home/miniconda/envs/build_binary/targets/x86_64-linux', '-DCUDAToolkit_ROOT=/github/home/miniconda/envs/build_binary/targets/x86_64-linux', '-DCMAKE_CXX_STANDARD=20'] 2025-05-07T19:50:43.3988890Z 2025-05-07T19:50:43.3988895Z 2025-05-07T19:50:43.3989209Z -------------------------------------------------------------------------------- 2025-05-07T19:50:43.3989612Z -- Trying 'Ninja' generator 2025-05-07T19:50:43.3989985Z -------------------------------- 2025-05-07T19:50:43.3990248Z --------------------------- 2025-05-07T19:50:43.3990476Z ---------------------- 2025-05-07T19:50:43.3990703Z ----------------- 2025-05-07T19:50:43.3990900Z ------------ 2025-05-07T19:50:43.3991101Z ------- 2025-05-07T19:50:43.3991278Z -- 2025-05-07T19:50:43.4377367Z CMake Deprecation Warning at CMakeLists.txt:1 (cmake_minimum_required): 2025-05-07T19:50:43.4378018Z Compatibility with CMake < 3.10 will be removed from a future version of 2025-05-07T19:50:43.4378433Z CMake. 2025-05-07T19:50:43.4378553Z 2025-05-07T19:50:43.4378797Z Update the VERSION argument value. Or, use the ... syntax 2025-05-07T19:50:43.4379387Z to tell CMake that the project requires at least but has been updated 2025-05-07T19:50:43.4379884Z to work with policies introduced by or earlier. 2025-05-07T19:50:43.4380137Z 2025-05-07T19:50:43.4380141Z 2025-05-07T19:50:43.4380345Z Not searching for unused variables given on the command line. 2025-05-07T19:50:43.4819866Z -- The C compiler identification is GNU 11.4.0 2025-05-07T19:50:43.4906159Z -- Detecting C compiler ABI info 2025-05-07T19:50:43.5860240Z -- Detecting C compiler ABI info - done 2025-05-07T19:50:43.6040277Z -- Check for working C compiler: /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-cc - skipped 2025-05-07T19:50:43.6042126Z -- Detecting C compile features 2025-05-07T19:50:43.6044295Z -- Detecting C compile features - done 2025-05-07T19:50:43.6847962Z -- The CXX compiler identification is GNU 11.4.0 2025-05-07T19:50:43.6920961Z -- Detecting CXX compiler ABI info 2025-05-07T19:50:43.7902136Z -- Detecting CXX compiler ABI info - done 2025-05-07T19:50:43.8096247Z -- Check for working CXX compiler: /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ - skipped 2025-05-07T19:50:43.8096957Z -- Detecting CXX compile features 2025-05-07T19:50:43.8106048Z -- Detecting CXX compile features - done 2025-05-07T19:50:43.8180662Z -- Configuring done (0.4s) 2025-05-07T19:50:43.8232880Z -- Generating done (0.0s) 2025-05-07T19:50:43.8253015Z -- Build files have been written to: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_cmake_test_compile/build 2025-05-07T19:50:43.8300320Z -- 2025-05-07T19:50:43.8300661Z ------- 2025-05-07T19:50:43.8300894Z ------------ 2025-05-07T19:50:43.8301131Z ----------------- 2025-05-07T19:50:43.8301373Z ---------------------- 2025-05-07T19:50:43.8301641Z --------------------------- 2025-05-07T19:50:43.8301902Z -------------------------------- 2025-05-07T19:50:43.8302211Z -- Trying 'Ninja' generator - success 2025-05-07T19:50:43.8302585Z -------------------------------------------------------------------------------- 2025-05-07T19:50:43.8303202Z 2025-05-07T19:50:43.8313346Z Configuring Project 2025-05-07T19:50:43.8313610Z Working directory: 2025-05-07T19:50:43.8314019Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.12/cmake-build 2025-05-07T19:50:43.8314443Z Command: 2025-05-07T19:50:43.8333289Z /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/cmake/data/bin/cmake /__w/FBGEMM/FBGEMM/fbgemm_gpu -G Ninja -DCMAKE_MAKE_PROGRAM:FILEPATH=/github/home/miniconda/envs/build_binary/bin/ninja --no-warn-unused-cli -DCMAKE_INSTALL_PREFIX:PATH=/__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.12/cmake-install -DPYTHON_VERSION_STRING:STRING=3.12.2 -DSKBUILD:INTERNAL=TRUE -DCMAKE_MODULE_PATH:PATH=/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/skbuild/resources/cmake -DPYTHON_EXECUTABLE:PATH=/github/home/miniconda/envs/build_binary/bin/python -DPYTHON_INCLUDE_DIR:PATH=/github/home/miniconda/envs/build_binary/include/python3.12 -DPYTHON_LIBRARY:PATH=/github/home/miniconda/envs/build_binary/lib/libpython3.12.so -DPython_EXECUTABLE:PATH=/github/home/miniconda/envs/build_binary/bin/python -DPython_ROOT_DIR:PATH=/github/home/miniconda/envs/build_binary -DPython_FIND_REGISTRY:STRING=NEVER -DPython_INCLUDE_DIR:PATH=/github/home/miniconda/envs/build_binary/include/python3.12 -DPython_NumPy_INCLUDE_DIRS:PATH=/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/numpy/_core/include -DPython3_EXECUTABLE:PATH=/github/home/miniconda/envs/build_binary/bin/python -DPython3_ROOT_DIR:PATH=/github/home/miniconda/envs/build_binary -DPython3_FIND_REGISTRY:STRING=NEVER -DPython3_INCLUDE_DIR:PATH=/github/home/miniconda/envs/build_binary/include/python3.12 -DPython3_NumPy_INCLUDE_DIRS:PATH=/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/numpy/_core/include -DCMAKE_MAKE_PROGRAM:FILEPATH=/github/home/miniconda/envs/build_binary/bin/ninja -DCMAKE_AR=/github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-ar -DCMAKE_CXX_COMPILER_AR=/github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-gcc-ar -DCMAKE_C_COMPILER_AR=/github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-gcc-ar -DCMAKE_RANLIB=/github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-ranlib -DCMAKE_CXX_COMPILER_RANLIB=/github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-gcc-ranlib -DCMAKE_C_COMPILER_RANLIB=/github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-gcc-ranlib -DCMAKE_LINKER=/github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-ld -DCMAKE_STRIP=/github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-strip -DCMAKE_BUILD_TYPE=Release -DCMAKE_PREFIX_PATH=/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch -D_GLIBCXX_USE_CXX11_ABI=1 -DCMAKE_VERBOSE_MAKEFILE=ON -DCMAKE_EXPORT_COMPILE_COMMANDS=TRUE -DFBGEMM_BUILD_TARGET=genai -DFBGEMM_BUILD_VARIANT=cuda -DNVML_LIB_PATH=/github/home/miniconda/envs/build_binary/lib/stubs/libnvidia-ml.so -DNCCL_INCLUDE_DIRS=/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -DNCCL_LIBRARIES=/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/lib/libnccl.so.2 '-DCMAKE_C_FLAGS='"'"'-DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/lib'"'"'' '-DCMAKE_CXX_FLAGS='"'"'-DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/lib'"'"'' '-DTORCH_CUDA_ARCH_LIST=7.0;8.0;9.0;9.0a;10.0a;12.0a' -DCUDA_TOOLKIT_ROOT_DIR=/github/home/miniconda/envs/build_binary/targets/x86_64-linux -DCUDAToolkit_ROOT=/github/home/miniconda/envs/build_binary/targets/x86_64-linux -DCMAKE_CXX_STANDARD=20 '-DTORCH_CUDA_ARCH_LIST=7.0;8.0;9.0;9.0a;10.0a;12.0a' -DCUDA_TOOLKIT_ROOT_DIR=/github/home/miniconda/envs/build_binary/targets/x86_64-linux -DCUDAToolkit_ROOT=/github/home/miniconda/envs/build_binary/targets/x86_64-linux -DCMAKE_CXX_STANDARD=20 -DCMAKE_AR=/github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-ar -DCMAKE_CXX_COMPILER_AR=/github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-gcc-ar -DCMAKE_C_COMPILER_AR=/github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-gcc-ar -DCMAKE_RANLIB=/github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-ranlib -DCMAKE_CXX_COMPILER_RANLIB=/github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-gcc-ranlib -DCMAKE_C_COMPILER_RANLIB=/github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-gcc-ranlib -DCMAKE_LINKER=/github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-ld -DCMAKE_STRIP=/github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-strip -DCMAKE_BUILD_TYPE=Release 2025-05-07T19:50:43.8351838Z 2025-05-07T19:50:43.8750291Z 2025-05-07T19:50:43.8751043Z Not searching for unused variables given on the command line. 2025-05-07T19:50:43.8752057Z 2025-05-07T19:50:43.8752389Z ================================================================================ 2025-05-07T19:50:43.8753388Z Default C compiler flags 2025-05-07T19:50:43.8754413Z (values may be overridden by CMAKE_CXX_STANDARD and CXX_STANDARD): 2025-05-07T19:50:43.8755743Z 2025-05-07T19:50:43.8757097Z -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/lib 2025-05-07T19:50:43.8758960Z ================================================================================ 2025-05-07T19:50:43.8759180Z 2025-05-07T19:50:43.8759183Z 2025-05-07T19:50:43.8759187Z 2025-05-07T19:50:43.8759297Z ================================================================================ 2025-05-07T19:50:43.8759628Z Default C++ compiler flags 2025-05-07T19:50:43.8759985Z (values may be overridden by CMAKE_CXX_STANDARD and CXX_STANDARD): 2025-05-07T19:50:43.8760263Z 2025-05-07T19:50:43.8760673Z -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/lib 2025-05-07T19:50:43.8761310Z ================================================================================ 2025-05-07T19:50:43.8761535Z 2025-05-07T19:50:43.8761539Z 2025-05-07T19:50:43.8761542Z 2025-05-07T19:50:43.8761656Z ================================================================================ 2025-05-07T19:50:43.8761971Z AVX2_FLAGS: 2025-05-07T19:50:43.8762085Z 2025-05-07T19:50:43.8762181Z -mavx2 2025-05-07T19:50:43.8762376Z -mf16c 2025-05-07T19:50:43.8762575Z -mfma 2025-05-07T19:50:43.8762763Z -fopenmp 2025-05-07T19:50:43.8763027Z ================================================================================ 2025-05-07T19:50:43.8763250Z 2025-05-07T19:50:43.8763253Z 2025-05-07T19:50:43.8763256Z 2025-05-07T19:50:43.8763374Z ================================================================================ 2025-05-07T19:50:43.8763715Z AVX512_FLAGS: 2025-05-07T19:50:43.8763842Z 2025-05-07T19:50:43.8763952Z -mavx2 2025-05-07T19:50:43.8764142Z -mf16c 2025-05-07T19:50:43.8764358Z -mfma 2025-05-07T19:50:43.8764562Z -mavx512f 2025-05-07T19:50:43.8764806Z -mavx512bw 2025-05-07T19:50:43.8765014Z -mavx512dq 2025-05-07T19:50:43.8765249Z -mavx512vl 2025-05-07T19:50:43.8765456Z -fopenmp 2025-05-07T19:50:43.8765723Z ================================================================================ 2025-05-07T19:50:43.8765944Z 2025-05-07T19:50:43.8765947Z 2025-05-07T19:50:43.8765951Z 2025-05-07T19:50:43.8766098Z ================================================================================ 2025-05-07T19:50:43.8766441Z The project is built using scikit-build 2025-05-07T19:50:43.8766803Z ================================================================================ 2025-05-07T19:50:43.8767024Z 2025-05-07T19:50:43.8767027Z 2025-05-07T19:50:43.8767030Z 2025-05-07T19:50:43.8767150Z ================================================================================ 2025-05-07T19:50:43.8767498Z Build Settings 2025-05-07T19:50:43.8767632Z 2025-05-07T19:50:43.8767772Z FBGEMM_BUILD_TARGET : genai 2025-05-07T19:50:43.8768050Z FBGEMM_BUILD_VARIANT : cuda 2025-05-07T19:50:43.8768350Z 2025-05-07T19:50:43.8768482Z NVCC_VERBOSE : 2025-05-07T19:50:43.8768743Z CUDNN_INCLUDE_DIR : 2025-05-07T19:50:43.8769034Z CUDNN_LIBRARY : 2025-05-07T19:50:43.8769453Z NVML_LIB_PATH : /github/home/miniconda/envs/build_binary/lib/stubs/libnvidia-ml.so 2025-05-07T19:50:43.8769951Z TORCH_CUDA_ARCH_LIST : 7.0 2025-05-07T19:50:43.8770215Z 8.0 2025-05-07T19:50:43.8770442Z 9.0 2025-05-07T19:50:43.8770623Z 9.0a 2025-05-07T19:50:43.8770825Z 10.0a 2025-05-07T19:50:43.8771010Z 12.0a 2025-05-07T19:50:43.8771134Z 2025-05-07T19:50:43.8771225Z HIP_ROOT_DIR : 2025-05-07T19:50:43.8771489Z HIPCC_VERBOSE : 2025-05-07T19:50:43.8771725Z AMDGPU_TARGETS : 2025-05-07T19:50:43.8771980Z PYTORCH_ROCM_ARCH : 2025-05-07T19:50:43.8772236Z ================================================================================ 2025-05-07T19:50:43.8772467Z 2025-05-07T19:50:43.9529443Z -- The CXX compiler identification is GNU 11.4.0 2025-05-07T19:50:43.9924921Z -- The C compiler identification is GNU 11.4.0 2025-05-07T19:50:44.9391535Z -- The CUDA compiler identification is NVIDIA 12.8.61 with host compiler GNU 11.4.0 2025-05-07T19:50:44.9489597Z -- Detecting CXX compiler ABI info 2025-05-07T19:50:45.0442481Z -- Detecting CXX compiler ABI info - done 2025-05-07T19:50:45.0633374Z -- Check for working CXX compiler: /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ - skipped 2025-05-07T19:50:45.0635281Z -- Detecting CXX compile features 2025-05-07T19:50:45.0641677Z -- Detecting CXX compile features - done 2025-05-07T19:50:45.0765215Z -- Detecting C compiler ABI info 2025-05-07T19:50:45.1636021Z -- Detecting C compiler ABI info - done 2025-05-07T19:50:45.1814347Z -- Check for working C compiler: /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-cc - skipped 2025-05-07T19:50:45.1815661Z -- Detecting C compile features 2025-05-07T19:50:45.1821666Z -- Detecting C compile features - done 2025-05-07T19:50:45.1926306Z -- Detecting CUDA compiler ABI info 2025-05-07T19:50:46.1198957Z -- Detecting CUDA compiler ABI info - done 2025-05-07T19:50:46.1765850Z -- Check for working CUDA compiler: /github/home/miniconda/envs/build_binary/bin/nvcc - skipped 2025-05-07T19:50:46.1786497Z -- Detecting CUDA compile features 2025-05-07T19:50:46.1789218Z -- Detecting CUDA compile features - done 2025-05-07T19:50:46.1865124Z -- Performing Test C_HAS_AVX_1 2025-05-07T19:50:46.4436211Z -- Performing Test C_HAS_AVX_1 - Failed 2025-05-07T19:50:46.4437329Z -- Performing Test C_HAS_AVX_2 2025-05-07T19:50:46.7178312Z -- Performing Test C_HAS_AVX_2 - Success 2025-05-07T19:50:46.7178817Z -- Performing Test C_HAS_AVX2_1 2025-05-07T19:50:46.9749531Z -- Performing Test C_HAS_AVX2_1 - Failed 2025-05-07T19:50:46.9752074Z -- Performing Test C_HAS_AVX2_2 2025-05-07T19:50:47.2441986Z -- Performing Test C_HAS_AVX2_2 - Success 2025-05-07T19:50:47.2443039Z -- Performing Test C_HAS_AVX512_1 2025-05-07T19:50:47.5010779Z -- Performing Test C_HAS_AVX512_1 - Failed 2025-05-07T19:50:47.5011853Z -- Performing Test C_HAS_AVX512_2 2025-05-07T19:50:47.7186437Z -- Performing Test C_HAS_AVX512_2 - Success 2025-05-07T19:50:47.7187379Z -- Performing Test CXX_HAS_AVX_1 2025-05-07T19:50:47.9755667Z -- Performing Test CXX_HAS_AVX_1 - Failed 2025-05-07T19:50:47.9756659Z -- Performing Test CXX_HAS_AVX_2 2025-05-07T19:50:48.2500775Z -- Performing Test CXX_HAS_AVX_2 - Success 2025-05-07T19:50:48.2501807Z -- Performing Test CXX_HAS_AVX2_1 2025-05-07T19:50:48.5070724Z -- Performing Test CXX_HAS_AVX2_1 - Failed 2025-05-07T19:50:48.5071755Z -- Performing Test CXX_HAS_AVX2_2 2025-05-07T19:50:48.7754396Z -- Performing Test CXX_HAS_AVX2_2 - Success 2025-05-07T19:50:48.7755449Z -- Performing Test CXX_HAS_AVX512_1 2025-05-07T19:50:49.0325740Z -- Performing Test CXX_HAS_AVX512_1 - Failed 2025-05-07T19:50:49.0326775Z -- Performing Test CXX_HAS_AVX512_2 2025-05-07T19:50:49.2499142Z -- Performing Test CXX_HAS_AVX512_2 - Success 2025-05-07T19:50:49.2673520Z -- Found CUDA: /github/home/miniconda/envs/build_binary/targets/x86_64-linux (found version "12.8") 2025-05-07T19:50:49.2706655Z -- Found CUDAToolkit: /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include (found version "12.8.61") 2025-05-07T19:50:49.2787558Z -- Performing Test CMAKE_HAVE_LIBC_PTHREAD 2025-05-07T19:50:49.3685397Z -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed 2025-05-07T19:50:49.3686564Z -- Looking for pthread_create in pthreads 2025-05-07T19:50:49.4461902Z -- Looking for pthread_create in pthreads - not found 2025-05-07T19:50:49.4463074Z -- Looking for pthread_create in pthread 2025-05-07T19:50:49.5350567Z -- Looking for pthread_create in pthread - found 2025-05-07T19:50:49.5362235Z -- Found Threads: TRUE 2025-05-07T19:50:49.6974856Z -- PyTorch: CUDA detected: 12.8 2025-05-07T19:50:49.6976389Z -- PyTorch: CUDA nvcc is: /github/home/miniconda/envs/build_binary/targets/x86_64-linux/bin/nvcc 2025-05-07T19:50:49.6978565Z -- PyTorch: CUDA toolkit directory: /github/home/miniconda/envs/build_binary/targets/x86_64-linux 2025-05-07T19:50:49.8206732Z -- PyTorch: Header version is: 12.8 2025-05-07T19:50:49.9160230Z -- Found Python: /github/home/miniconda/envs/build_binary/bin/python (found version "3.12.2") found components: Interpreter 2025-05-07T19:50:49.9174627Z CMake Warning at /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/share/cmake/Caffe2/public/cuda.cmake:140 (message): 2025-05-07T19:50:49.9175494Z Failed to compute shorthash for libnvrtc.so 2025-05-07T19:50:49.9175831Z Call Stack (most recent call first): 2025-05-07T19:50:49.9176539Z /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/share/cmake/Caffe2/Caffe2Config.cmake:86 (include) 2025-05-07T19:50:49.9177662Z /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:68 (find_package) 2025-05-07T19:50:49.9178725Z /__w/FBGEMM/FBGEMM/cmake/modules/PyTorchSetup.cmake:14 (find_package) 2025-05-07T19:50:49.9179155Z CMakeLists.txt:112 (include) 2025-05-07T19:50:49.9179342Z 2025-05-07T19:50:49.9179347Z 2025-05-07T19:50:49.9179497Z -- USE_CUDNN is set to 0. Compiling without cuDNN support 2025-05-07T19:50:49.9179958Z -- USE_CUSPARSELT is set to 0. Compiling without cuSPARSELt support 2025-05-07T19:50:49.9180397Z -- USE_CUDSS is set to 0. Compiling without cuDSS support 2025-05-07T19:50:49.9180783Z -- USE_CUFILE is set to 0. Compiling without cuFile support 2025-05-07T19:50:49.9181816Z -- Added CUDA NVCC flags for: -gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_90,code=sm_90;-gencode;arch=compute_90a,code=sm_90a;-gencode;arch=compute_100a,code=sm_100a;-gencode;arch=compute_120a,code=sm_120a 2025-05-07T19:50:49.9522520Z CMake Warning at /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:22 (message): 2025-05-07T19:50:49.9524966Z static library kineto_LIBRARY-NOTFOUND not found. 2025-05-07T19:50:49.9526022Z Call Stack (most recent call first): 2025-05-07T19:50:49.9528426Z /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:125 (append_torchlib_if_found) 2025-05-07T19:50:49.9529359Z /__w/FBGEMM/FBGEMM/cmake/modules/PyTorchSetup.cmake:14 (find_package) 2025-05-07T19:50:49.9529802Z CMakeLists.txt:112 (include) 2025-05-07T19:50:49.9530000Z 2025-05-07T19:50:49.9530005Z 2025-05-07T19:50:49.9530036Z 2025-05-07T19:50:49.9530040Z 2025-05-07T19:50:49.9530160Z ================================================================================ 2025-05-07T19:50:49.9530499Z PyTorch Flags: 2025-05-07T19:50:49.9530713Z 2025-05-07T19:50:49.9530926Z TORCH_INCLUDE_DIRS: 2025-05-07T19:50:49.9531347Z /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include 2025-05-07T19:50:49.9532142Z /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include 2025-05-07T19:50:49.9532736Z 2025-05-07T19:50:49.9532930Z TORCH_LIBRARIES: 2025-05-07T19:50:49.9533425Z torch 2025-05-07T19:50:49.9533618Z torch_library 2025-05-07T19:50:49.9534076Z /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/lib/libc10.so 2025-05-07T19:50:49.9534750Z /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/libnvrtc.so 2025-05-07T19:50:49.9535462Z /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/lib/libc10_cuda.so 2025-05-07T19:50:49.9535984Z 2025-05-07T19:50:49.9536206Z TORCH_CUDA_OPTIONS: 2025-05-07T19:50:49.9536583Z --expt-relaxed-constexpr 2025-05-07T19:50:49.9536851Z -D__CUDA_NO_HALF_OPERATORS__ 2025-05-07T19:50:49.9537138Z -D__CUDA_NO_BFLOAT16_CONVERSIONS__ 2025-05-07T19:50:49.9537422Z -D__CUDA_NO_HALF2_OPERATORS__ 2025-05-07T19:50:49.9537724Z ================================================================================ 2025-05-07T19:50:49.9537954Z 2025-05-07T19:50:49.9537958Z 2025-05-07T19:50:49.9537961Z 2025-05-07T19:50:49.9538076Z ================================================================================ 2025-05-07T19:50:49.9538405Z NCCL Flags 2025-05-07T19:50:49.9538526Z 2025-05-07T19:50:49.9539016Z NCCL_INCLUDE_DIRS=/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include 2025-05-07T19:50:49.9539899Z NCCL_LIBRARIES=/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/lib/libnccl.so.2 2025-05-07T19:50:49.9540543Z ================================================================================ 2025-05-07T19:50:49.9540769Z 2025-05-07T19:50:49.9541133Z -- Found Torch: /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/lib/libtorch.so 2025-05-07T19:50:49.9541646Z 2025-05-07T19:50:49.9541650Z 2025-05-07T19:50:49.9541769Z ================================================================================ 2025-05-07T19:50:49.9542098Z CUDA Driver Path 2025-05-07T19:50:49.9542232Z 2025-05-07T19:50:49.9542580Z CUDA_DRIVER_LIBRARIES=/github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/stubs/libcuda.so 2025-05-07T19:50:49.9543177Z ================================================================================ 2025-05-07T19:50:49.9543405Z 2025-05-07T19:50:49.9543836Z -- Found NVML_LIB_PATH: /github/home/miniconda/envs/build_binary/lib/stubs/libnvidia-ml.so 2025-05-07T19:50:49.9556226Z 2025-05-07T19:50:49.9556319Z 2025-05-07T19:50:49.9556833Z ================================================================================ 2025-05-07T19:50:49.9557879Z GPU CPP Library Target: asmjit (SHARED) 2025-05-07T19:50:49.9558210Z 2025-05-07T19:50:49.9558406Z CPU_SRCS: 2025-05-07T19:50:49.9558547Z 2025-05-07T19:50:49.9558631Z 2025-05-07T19:50:49.9558819Z GPU_SRCS: 2025-05-07T19:50:49.9558959Z 2025-05-07T19:50:49.9559043Z 2025-05-07T19:50:49.9559240Z CUDA_SPECIFIC_SRCS: 2025-05-07T19:50:49.9559404Z 2025-05-07T19:50:49.9559488Z 2025-05-07T19:50:49.9559701Z HIP_SPECIFIC_SRCS: 2025-05-07T19:50:49.9559845Z 2025-05-07T19:50:49.9559926Z 2025-05-07T19:50:49.9560138Z OTHER_SRCS: 2025-05-07T19:50:49.9560523Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/arm/a64assembler.cpp 2025-05-07T19:50:49.9561182Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/arm/a64builder.cpp 2025-05-07T19:50:49.9561784Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/arm/a64compiler.cpp 2025-05-07T19:50:49.9562417Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/arm/a64emithelper.cpp 2025-05-07T19:50:49.9563020Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/arm/a64formatter.cpp 2025-05-07T19:50:49.9563617Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/arm/a64func.cpp 2025-05-07T19:50:49.9564208Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/arm/a64instapi.cpp 2025-05-07T19:50:49.9564792Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/arm/a64instdb.cpp 2025-05-07T19:50:49.9565391Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/arm/a64operand.cpp 2025-05-07T19:50:49.9565972Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/arm/a64rapass.cpp 2025-05-07T19:50:49.9566875Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/arm/armformatter.cpp 2025-05-07T19:50:49.9567480Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/archtraits.cpp 2025-05-07T19:50:49.9568058Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/assembler.cpp 2025-05-07T19:50:49.9568823Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/builder.cpp 2025-05-07T19:50:49.9569403Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/codeholder.cpp 2025-05-07T19:50:49.9570025Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/codewriter.cpp 2025-05-07T19:50:49.9570614Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/compiler.cpp 2025-05-07T19:50:49.9571216Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/constpool.cpp 2025-05-07T19:50:49.9571810Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/cpuinfo.cpp 2025-05-07T19:50:49.9572397Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/emithelper.cpp 2025-05-07T19:50:49.9573081Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/emitter.cpp 2025-05-07T19:50:49.9573674Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/emitterutils.cpp 2025-05-07T19:50:49.9574298Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/environment.cpp 2025-05-07T19:50:49.9574924Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/errorhandler.cpp 2025-05-07T19:50:49.9575528Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/formatter.cpp 2025-05-07T19:50:49.9576120Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/func.cpp 2025-05-07T19:50:49.9576719Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/funcargscontext.cpp 2025-05-07T19:50:49.9577343Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/globals.cpp 2025-05-07T19:50:49.9577904Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/inst.cpp 2025-05-07T19:50:49.9578480Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/instdb.cpp 2025-05-07T19:50:49.9579188Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/jitallocator.cpp 2025-05-07T19:50:49.9579781Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/jitruntime.cpp 2025-05-07T19:50:49.9580362Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/logger.cpp 2025-05-07T19:50:49.9580913Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/operand.cpp 2025-05-07T19:50:49.9581483Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/osutils.cpp 2025-05-07T19:50:49.9582032Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/ralocal.cpp 2025-05-07T19:50:49.9582598Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/rapass.cpp 2025-05-07T19:50:49.9583161Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/rastack.cpp 2025-05-07T19:50:49.9584038Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/string.cpp 2025-05-07T19:50:49.9584631Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/support.cpp 2025-05-07T19:50:49.9585195Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/target.cpp 2025-05-07T19:50:49.9585773Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/type.cpp 2025-05-07T19:50:49.9586330Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/virtmem.cpp 2025-05-07T19:50:49.9586909Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/zone.cpp 2025-05-07T19:50:49.9587493Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/zonehash.cpp 2025-05-07T19:50:49.9588071Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/zonelist.cpp 2025-05-07T19:50:49.9588674Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/zonestack.cpp 2025-05-07T19:50:49.9589257Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/zonetree.cpp 2025-05-07T19:50:49.9590865Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/core/zonevector.cpp 2025-05-07T19:50:49.9591471Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/x86/x86assembler.cpp 2025-05-07T19:50:49.9592136Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/x86/x86builder.cpp 2025-05-07T19:50:49.9592713Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/x86/x86compiler.cpp 2025-05-07T19:50:49.9593316Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/x86/x86emithelper.cpp 2025-05-07T19:50:49.9593903Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/x86/x86formatter.cpp 2025-05-07T19:50:49.9594483Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/x86/x86func.cpp 2025-05-07T19:50:49.9595053Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/x86/x86instapi.cpp 2025-05-07T19:50:49.9595616Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/x86/x86instdb.cpp 2025-05-07T19:50:49.9596198Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/x86/x86operand.cpp 2025-05-07T19:50:49.9596834Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src/asmjit/x86/x86rapass.cpp 2025-05-07T19:50:49.9597271Z 2025-05-07T19:50:49.9597456Z CC_FLAGS: 2025-05-07T19:50:49.9597587Z 2025-05-07T19:50:49.9597665Z 2025-05-07T19:50:49.9597865Z NVCC_FLAGS: 2025-05-07T19:50:49.9597982Z 2025-05-07T19:50:49.9598062Z 2025-05-07T19:50:49.9598264Z HIPCC_FLAGS: 2025-05-07T19:50:49.9598387Z 2025-05-07T19:50:49.9598465Z 2025-05-07T19:50:49.9598671Z INCLUDE_DIRS: 2025-05-07T19:50:49.9598906Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../include 2025-05-07T19:50:49.9599234Z /__w/FBGEMM/FBGEMM/fbgemm_gpu 2025-05-07T19:50:49.9599514Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/include 2025-05-07T19:50:49.9599832Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../include 2025-05-07T19:50:49.9600313Z /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include 2025-05-07T19:50:49.9601096Z /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include 2025-05-07T19:50:49.9601742Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src 2025-05-07T19:50:49.9602145Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include 2025-05-07T19:50:49.9602578Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include 2025-05-07T19:50:49.9603032Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include 2025-05-07T19:50:49.9603550Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include 2025-05-07T19:50:49.9603996Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include 2025-05-07T19:50:49.9604554Z /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include 2025-05-07T19:50:49.9605057Z 2025-05-07T19:50:49.9605252Z Selected Source Files: 2025-05-07T19:50:49.9605402Z 2025-05-07T19:50:49.9605498Z 2025-05-07T19:50:49.9605693Z HIPified Source Files: 2025-05-07T19:50:49.9605864Z 2025-05-07T19:50:49.9605944Z 2025-05-07T19:50:49.9606136Z Library Dependencies: 2025-05-07T19:50:49.9606381Z torch 2025-05-07T19:50:49.9606570Z torch_library 2025-05-07T19:50:49.9607012Z /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/lib/libc10.so 2025-05-07T19:50:49.9607688Z /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/libnvrtc.so 2025-05-07T19:50:49.9608365Z /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/lib/libc10_cuda.so 2025-05-07T19:50:49.9609563Z /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/lib/libnccl.so.2 2025-05-07T19:50:49.9610303Z /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/stubs/libcuda.so 2025-05-07T19:50:49.9610929Z /github/home/miniconda/envs/build_binary/lib/stubs/libnvidia-ml.so 2025-05-07T19:50:49.9611344Z 2025-05-07T19:50:49.9611534Z Output Library: 2025-05-07T19:50:49.9611762Z asmjit 2025-05-07T19:50:49.9611958Z 2025-05-07T19:50:49.9612199Z Destination Directory: 2025-05-07T19:50:49.9612450Z fbgemm_gpu 2025-05-07T19:50:49.9612727Z ================================================================================ 2025-05-07T19:50:49.9613120Z 2025-05-07T19:50:49.9613125Z 2025-05-07T19:50:49.9613128Z 2025-05-07T19:50:49.9613262Z ================================================================================ 2025-05-07T19:50:49.9613652Z GPU CPP Library Target: fbgemm (SHARED) 2025-05-07T19:50:49.9613991Z 2025-05-07T19:50:49.9614195Z CPU_SRCS: 2025-05-07T19:50:49.9614323Z 2025-05-07T19:50:49.9614439Z 2025-05-07T19:50:49.9614637Z GPU_SRCS: 2025-05-07T19:50:49.9614760Z 2025-05-07T19:50:49.9614869Z 2025-05-07T19:50:49.9615075Z CUDA_SPECIFIC_SRCS: 2025-05-07T19:50:49.9615247Z 2025-05-07T19:50:49.9615332Z 2025-05-07T19:50:49.9615538Z HIP_SPECIFIC_SRCS: 2025-05-07T19:50:49.9615707Z 2025-05-07T19:50:49.9615794Z 2025-05-07T19:50:49.9615990Z OTHER_SRCS: 2025-05-07T19:50:49.9616291Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../src/EmbeddingSpMDM.cc 2025-05-07T19:50:49.9616768Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../src/EmbeddingSpMDMAutovec.cc 2025-05-07T19:50:49.9617243Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../src/EmbeddingSpMDMNBit.cc 2025-05-07T19:50:49.9617690Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../src/QuantUtils.cc 2025-05-07T19:50:49.9618248Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../src/RefImplementations.cc 2025-05-07T19:50:49.9618770Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../src/RowWiseSparseAdagradFused.cc 2025-05-07T19:50:49.9619241Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../src/SparseAdagrad.cc 2025-05-07T19:50:49.9619650Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../src/Utils.cc 2025-05-07T19:50:49.9620080Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../src/EmbeddingSpMDMAvx2.cc 2025-05-07T19:50:49.9620514Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../src/QuantUtilsAvx2.cc 2025-05-07T19:50:49.9620973Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../src/EmbeddingSpMDMAvx2.cc 2025-05-07T19:50:49.9621512Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../src/QuantUtilsAvx2.cc 2025-05-07T19:50:49.9621949Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../src/EmbeddingSpMDMAvx512.cc 2025-05-07T19:50:49.9622297Z 2025-05-07T19:50:49.9622510Z CC_FLAGS: 2025-05-07T19:50:49.9622627Z 2025-05-07T19:50:49.9622718Z 2025-05-07T19:50:49.9622931Z NVCC_FLAGS: 2025-05-07T19:50:49.9623050Z 2025-05-07T19:50:49.9623159Z 2025-05-07T19:50:49.9623354Z HIPCC_FLAGS: 2025-05-07T19:50:49.9623563Z 2025-05-07T19:50:49.9623672Z 2025-05-07T19:50:49.9624037Z INCLUDE_DIRS: 2025-05-07T19:50:49.9624319Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../include 2025-05-07T19:50:49.9624762Z /__w/FBGEMM/FBGEMM/fbgemm_gpu 2025-05-07T19:50:49.9625050Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/include 2025-05-07T19:50:49.9625388Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../include 2025-05-07T19:50:49.9625879Z /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include 2025-05-07T19:50:49.9626685Z /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include 2025-05-07T19:50:49.9627329Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src 2025-05-07T19:50:49.9627761Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include 2025-05-07T19:50:49.9628208Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include 2025-05-07T19:50:49.9628689Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include 2025-05-07T19:50:49.9629226Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include 2025-05-07T19:50:49.9629687Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include 2025-05-07T19:50:49.9630269Z /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include 2025-05-07T19:50:49.9630772Z 2025-05-07T19:50:49.9630996Z Selected Source Files: 2025-05-07T19:50:49.9631153Z 2025-05-07T19:50:49.9631238Z 2025-05-07T19:50:49.9631461Z HIPified Source Files: 2025-05-07T19:50:49.9631617Z 2025-05-07T19:50:49.9631722Z 2025-05-07T19:50:49.9631927Z Library Dependencies: 2025-05-07T19:50:49.9632187Z torch 2025-05-07T19:50:49.9632385Z torch_library 2025-05-07T19:50:49.9632838Z /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/lib/libc10.so 2025-05-07T19:50:49.9633513Z /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/libnvrtc.so 2025-05-07T19:50:49.9634317Z /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/lib/libc10_cuda.so 2025-05-07T19:50:49.9635117Z /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/lib/libnccl.so.2 2025-05-07T19:50:49.9635876Z /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/stubs/libcuda.so 2025-05-07T19:50:49.9636468Z asmjit 2025-05-07T19:50:49.9636769Z /github/home/miniconda/envs/build_binary/lib/stubs/libnvidia-ml.so 2025-05-07T19:50:49.9637156Z 2025-05-07T19:50:49.9637334Z Output Library: 2025-05-07T19:50:49.9637551Z fbgemm 2025-05-07T19:50:49.9637726Z 2025-05-07T19:50:49.9637928Z Destination Directory: 2025-05-07T19:50:49.9638149Z fbgemm_gpu 2025-05-07T19:50:49.9638380Z ================================================================================ 2025-05-07T19:50:49.9638595Z 2025-05-07T19:50:49.9638599Z 2025-05-07T19:50:49.9638602Z 2025-05-07T19:50:49.9638728Z ================================================================================ 2025-05-07T19:50:49.9639045Z Running code generation script ... 2025-05-07T19:50:49.9639822Z /github/home/miniconda/envs/build_binary/bin/python /__w/FBGEMM/FBGEMM/fbgemm_gpu/codegen/genscript/generate_backward_split.py --opensource 2025-05-07T19:50:49.9640536Z ================================================================================ 2025-05-07T19:50:49.9640765Z 2025-05-07T19:50:50.5136710Z [ARGS PARSE] Parsed arguments: Namespace(install_dir='.', is_fbcode=False, is_rocm=False) 2025-05-07T19:50:50.5138040Z [GENERAATE BACKWARD SPLIT]: ['/__w/FBGEMM/FBGEMM/fbgemm_gpu/codegen/genscript/generate_backward_split.py', '--opensource'] 2025-05-07T19:50:50.5138785Z Written: gen_embedding_backward_dense_split_weighted_vbe_cuda.cu 2025-05-07T19:50:50.5139277Z Written: gen_embedding_backward_dense_split_weighted_cuda.cu 2025-05-07T19:50:50.5139758Z Written: gen_embedding_backward_dense_split_unweighted_nobag_cuda.cu 2025-05-07T19:50:50.5140401Z Written: gen_embedding_backward_dense_split_unweighted_vbe_cuda.cu 2025-05-07T19:50:50.5140912Z Written: gen_embedding_backward_dense_split_unweighted_cuda.cu 2025-05-07T19:50:50.5141388Z Written: gen_embedding_backward_dense_split_weighted_vbe_meta.cpp 2025-05-07T19:50:50.5141994Z Written: gen_embedding_backward_dense_split_weighted_meta.cpp 2025-05-07T19:50:50.5142453Z Written: gen_embedding_backward_dense_split_unweighted_nobag_meta.cpp 2025-05-07T19:50:50.5142953Z Written: gen_embedding_backward_dense_split_unweighted_vbe_meta.cpp 2025-05-07T19:50:50.5143539Z Written: gen_embedding_backward_dense_split_unweighted_meta.cpp 2025-05-07T19:50:50.5144255Z Written: gen_embedding_backward_dense_split_weighted_vbe_kernel_cta.cu 2025-05-07T19:50:50.5144793Z Written: gen_embedding_backward_dense_split_weighted_kernel_cta.cu 2025-05-07T19:50:50.5145324Z Written: gen_embedding_backward_dense_split_unweighted_nobag_kernel_cta.cu 2025-05-07T19:50:50.5145899Z Written: gen_embedding_backward_dense_split_unweighted_vbe_kernel_cta.cu 2025-05-07T19:50:50.5146435Z Written: gen_embedding_backward_dense_split_unweighted_kernel_cta.cu 2025-05-07T19:50:50.5146989Z Written: gen_embedding_backward_dense_split_weighted_vbe_kernel_warp.cu 2025-05-07T19:50:50.5147516Z Written: gen_embedding_backward_dense_split_weighted_kernel_warp.cu 2025-05-07T19:50:50.5148065Z Written: gen_embedding_backward_dense_split_unweighted_nobag_kernel_warp.cu 2025-05-07T19:50:50.5148699Z Written: gen_embedding_backward_dense_split_unweighted_vbe_kernel_warp.cu 2025-05-07T19:50:50.5149237Z Written: gen_embedding_backward_dense_split_unweighted_kernel_warp.cu 2025-05-07T19:50:50.5149753Z Written: gen_embedding_optimizer_dense_split_device_kernel.cuh 2025-05-07T19:50:50.5150310Z Written: gen_embedding_backward_split_dense.cpp 2025-05-07T19:50:50.5150660Z Written: gen_embedding_backward_dense_split_cpu.cpp 2025-05-07T19:50:50.5151073Z Written: gen_embedding_backward_adagrad_split_weighted_cuda.cu 2025-05-07T19:50:50.5151538Z Written: gen_embedding_backward_adagrad_split_unweighted_nobag_cuda.cu 2025-05-07T19:50:50.5152290Z Written: gen_embedding_backward_adagrad_split_unweighted_cuda.cu 2025-05-07T19:50:50.5152735Z Written: gen_embedding_backward_adagrad_split_weighted_meta.cpp 2025-05-07T19:50:50.5153223Z Written: gen_embedding_backward_adagrad_split_unweighted_nobag_meta.cpp 2025-05-07T19:50:50.5153720Z Written: gen_embedding_backward_adagrad_split_unweighted_meta.cpp 2025-05-07T19:50:50.5154186Z Written: gen_embedding_backward_adagrad_split_weighted_kernel_cta.cu 2025-05-07T19:50:50.5154713Z Written: gen_embedding_backward_adagrad_split_unweighted_nobag_kernel_cta.cu 2025-05-07T19:50:50.5155237Z Written: gen_embedding_backward_adagrad_split_unweighted_kernel_cta.cu 2025-05-07T19:50:50.5155749Z Written: gen_embedding_backward_adagrad_split_weighted_kernel_warp.cu 2025-05-07T19:50:50.5156263Z Written: gen_embedding_backward_adagrad_split_unweighted_nobag_kernel_warp.cu 2025-05-07T19:50:50.5156805Z Written: gen_embedding_backward_adagrad_split_unweighted_kernel_warp.cu 2025-05-07T19:50:50.5157296Z Written: gen_embedding_optimizer_adagrad_split_device_kernel.cuh 2025-05-07T19:50:50.5157702Z Written: gen_embedding_backward_split_adagrad.cpp 2025-05-07T19:50:50.5158221Z Written: gen_embedding_split_adagrad_pt2_autograd.cpp 2025-05-07T19:50:50.5158646Z Written: gen_embedding_backward_split_adagrad_pt2_cuda_wrapper.cpp 2025-05-07T19:50:50.5159048Z Written: lookup_adagrad.py 2025-05-07T19:50:50.5159343Z Written: gen_embedding_backward_adagrad_split_cpu.cpp 2025-05-07T19:50:50.5159733Z Written: gen_embedding_backward_split_adagrad_cpu.cpp 2025-05-07T19:50:50.5160150Z Written: gen_embedding_backward_split_adagrad_pt2_cpu_wrapper.cpp 2025-05-07T19:50:50.5160624Z Written: gen_embedding_backward_adam_split_weighted_vbe_cuda.cu 2025-05-07T19:50:50.5161081Z Written: gen_embedding_backward_adam_split_weighted_cuda.cu 2025-05-07T19:50:50.5161520Z Written: gen_embedding_backward_adam_split_unweighted_nobag_cuda.cu 2025-05-07T19:50:50.5161997Z Written: gen_embedding_backward_adam_split_unweighted_vbe_cuda.cu 2025-05-07T19:50:50.5162443Z Written: gen_embedding_backward_adam_split_unweighted_cuda.cu 2025-05-07T19:50:50.5162894Z Written: gen_embedding_backward_adam_split_weighted_vbe_meta.cpp 2025-05-07T19:50:50.5163329Z Written: gen_embedding_backward_adam_split_weighted_meta.cpp 2025-05-07T19:50:50.5163794Z Written: gen_embedding_backward_adam_split_unweighted_nobag_meta.cpp 2025-05-07T19:50:50.5164281Z Written: gen_embedding_backward_adam_split_unweighted_vbe_meta.cpp 2025-05-07T19:50:50.5164727Z Written: gen_embedding_backward_adam_split_unweighted_meta.cpp 2025-05-07T19:50:50.5165202Z Written: gen_embedding_backward_adam_split_weighted_vbe_kernel_cta.cu 2025-05-07T19:50:50.5165672Z Written: gen_embedding_backward_adam_split_weighted_kernel_cta.cu 2025-05-07T19:50:50.5166178Z Written: gen_embedding_backward_adam_split_unweighted_nobag_kernel_cta.cu 2025-05-07T19:50:50.5166688Z Written: gen_embedding_backward_adam_split_unweighted_vbe_kernel_cta.cu 2025-05-07T19:50:50.5167194Z Written: gen_embedding_backward_adam_split_unweighted_kernel_cta.cu 2025-05-07T19:50:50.5167698Z Written: gen_embedding_backward_adam_split_weighted_vbe_kernel_warp.cu 2025-05-07T19:50:50.5168179Z Written: gen_embedding_backward_adam_split_weighted_kernel_warp.cu 2025-05-07T19:50:50.5168687Z Written: gen_embedding_backward_adam_split_unweighted_nobag_kernel_warp.cu 2025-05-07T19:50:50.5169207Z Written: gen_embedding_backward_adam_split_unweighted_vbe_kernel_warp.cu 2025-05-07T19:50:50.5169719Z Written: gen_embedding_backward_adam_split_unweighted_kernel_warp.cu 2025-05-07T19:50:50.5170174Z Written: gen_embedding_optimizer_adam_split_device_kernel.cuh 2025-05-07T19:50:50.5170585Z Written: gen_embedding_backward_split_adam.cpp 2025-05-07T19:50:50.5170953Z Written: gen_embedding_split_adam_pt2_autograd.cpp 2025-05-07T19:50:50.5171354Z Written: gen_embedding_backward_split_adam_pt2_cuda_wrapper.cpp 2025-05-07T19:50:50.5171738Z Written: lookup_adam.py 2025-05-07T19:50:50.5172010Z Written: gen_embedding_backward_split_adam_cpu.cpp 2025-05-07T19:50:50.5172423Z Written: gen_embedding_backward_split_adam_pt2_cpu_wrapper.cpp 2025-05-07T19:50:50.5172916Z Written: gen_embedding_backward_lamb_split_weighted_cuda.cu 2025-05-07T19:50:50.5173373Z Written: gen_embedding_backward_lamb_split_unweighted_nobag_cuda.cu 2025-05-07T19:50:50.5173836Z Written: gen_embedding_backward_lamb_split_unweighted_cuda.cu 2025-05-07T19:50:50.5174259Z Written: gen_embedding_backward_lamb_split_weighted_meta.cpp 2025-05-07T19:50:50.5174720Z Written: gen_embedding_backward_lamb_split_unweighted_nobag_meta.cpp 2025-05-07T19:50:50.5175174Z Written: gen_embedding_backward_lamb_split_unweighted_meta.cpp 2025-05-07T19:50:50.5175633Z Written: gen_embedding_backward_lamb_split_weighted_kernel_cta.cu 2025-05-07T19:50:50.5176116Z Written: gen_embedding_backward_lamb_split_unweighted_nobag_kernel_cta.cu 2025-05-07T19:50:50.5176622Z Written: gen_embedding_backward_lamb_split_unweighted_kernel_cta.cu 2025-05-07T19:50:50.5177104Z Written: gen_embedding_backward_lamb_split_weighted_kernel_warp.cu 2025-05-07T19:50:50.5177593Z Written: gen_embedding_backward_lamb_split_unweighted_nobag_kernel_warp.cu 2025-05-07T19:50:50.5178175Z Written: gen_embedding_backward_lamb_split_unweighted_kernel_warp.cu 2025-05-07T19:50:50.5178626Z Written: gen_embedding_optimizer_lamb_split_device_kernel.cuh 2025-05-07T19:50:50.5179030Z Written: gen_embedding_backward_split_lamb.cpp 2025-05-07T19:50:50.5179376Z Written: gen_embedding_split_lamb_pt2_autograd.cpp 2025-05-07T19:50:50.5179795Z Written: gen_embedding_backward_split_lamb_pt2_cuda_wrapper.cpp 2025-05-07T19:50:50.5180170Z Written: lookup_lamb.py 2025-05-07T19:50:50.5180443Z Written: gen_embedding_backward_split_lamb_cpu.cpp 2025-05-07T19:50:50.5180859Z Written: gen_embedding_backward_split_lamb_pt2_cpu_wrapper.cpp 2025-05-07T19:50:50.5181299Z Written: gen_embedding_backward_lars_sgd_split_weighted_cuda.cu 2025-05-07T19:50:50.5181784Z Written: gen_embedding_backward_lars_sgd_split_unweighted_nobag_cuda.cu 2025-05-07T19:50:50.5182264Z Written: gen_embedding_backward_lars_sgd_split_unweighted_cuda.cu 2025-05-07T19:50:50.5182736Z Written: gen_embedding_backward_lars_sgd_split_weighted_meta.cpp 2025-05-07T19:50:50.5183218Z Written: gen_embedding_backward_lars_sgd_split_unweighted_nobag_meta.cpp 2025-05-07T19:50:50.5184010Z Written: gen_embedding_backward_lars_sgd_split_unweighted_meta.cpp 2025-05-07T19:50:50.5184544Z Written: gen_embedding_backward_lars_sgd_split_weighted_kernel_cta.cu 2025-05-07T19:50:50.5185098Z Written: gen_embedding_backward_lars_sgd_split_unweighted_nobag_kernel_cta.cu 2025-05-07T19:50:50.5185680Z Written: gen_embedding_backward_lars_sgd_split_unweighted_kernel_cta.cu 2025-05-07T19:50:50.5186216Z Written: gen_embedding_backward_lars_sgd_split_weighted_kernel_warp.cu 2025-05-07T19:50:50.5186799Z Written: gen_embedding_backward_lars_sgd_split_unweighted_nobag_kernel_warp.cu 2025-05-07T19:50:50.5187382Z Written: gen_embedding_backward_lars_sgd_split_unweighted_kernel_warp.cu 2025-05-07T19:50:50.5187898Z Written: gen_embedding_optimizer_lars_sgd_split_device_kernel.cuh 2025-05-07T19:50:50.5188355Z Written: gen_embedding_backward_split_lars_sgd.cpp 2025-05-07T19:50:50.5188749Z Written: gen_embedding_split_lars_sgd_pt2_autograd.cpp 2025-05-07T19:50:50.5189222Z Written: gen_embedding_backward_split_lars_sgd_pt2_cuda_wrapper.cpp 2025-05-07T19:50:50.5189631Z Written: lookup_lars_sgd.py 2025-05-07T19:50:50.5189970Z Written: gen_embedding_backward_split_lars_sgd_cpu.cpp 2025-05-07T19:50:50.5190421Z Written: gen_embedding_backward_split_lars_sgd_pt2_cpu_wrapper.cpp 2025-05-07T19:50:50.5190970Z Written: gen_embedding_backward_partial_rowwise_adam_split_weighted_cuda.cu 2025-05-07T19:50:50.5191579Z Written: gen_embedding_backward_partial_rowwise_adam_split_unweighted_nobag_cuda.cu 2025-05-07T19:50:50.5192186Z Written: gen_embedding_backward_partial_rowwise_adam_split_unweighted_cuda.cu 2025-05-07T19:50:50.5192781Z Written: gen_embedding_backward_partial_rowwise_adam_split_weighted_meta.cpp 2025-05-07T19:50:50.5193383Z Written: gen_embedding_backward_partial_rowwise_adam_split_unweighted_nobag_meta.cpp 2025-05-07T19:50:50.5194097Z Written: gen_embedding_backward_partial_rowwise_adam_split_unweighted_meta.cpp 2025-05-07T19:50:50.5194720Z Written: gen_embedding_backward_partial_rowwise_adam_split_weighted_kernel_cta.cu 2025-05-07T19:50:50.5195368Z Written: gen_embedding_backward_partial_rowwise_adam_split_unweighted_nobag_kernel_cta.cu 2025-05-07T19:50:50.5196042Z Written: gen_embedding_backward_partial_rowwise_adam_split_unweighted_kernel_cta.cu 2025-05-07T19:50:50.5196751Z Written: gen_embedding_backward_partial_rowwise_adam_split_weighted_kernel_warp.cu 2025-05-07T19:50:50.5197365Z Written: gen_embedding_backward_partial_rowwise_adam_split_unweighted_nobag_kernel_warp.cu 2025-05-07T19:50:50.5197974Z Written: gen_embedding_backward_partial_rowwise_adam_split_unweighted_kernel_warp.cu 2025-05-07T19:50:50.6035354Z Written: gen_embedding_optimizer_partial_rowwise_adam_split_device_kernel.cuh 2025-05-07T19:50:50.6036032Z Written: gen_embedding_backward_split_partial_rowwise_adam.cpp 2025-05-07T19:50:50.6036556Z Written: gen_embedding_split_partial_rowwise_adam_pt2_autograd.cpp 2025-05-07T19:50:50.6037434Z Written: gen_embedding_backward_split_partial_rowwise_adam_pt2_cuda_wrapper.cpp 2025-05-07T19:50:50.6037939Z Written: lookup_partial_rowwise_adam.py 2025-05-07T19:50:50.6038349Z Written: gen_embedding_backward_split_partial_rowwise_adam_cpu.cpp 2025-05-07T19:50:50.6038919Z Written: gen_embedding_backward_split_partial_rowwise_adam_pt2_cpu_wrapper.cpp 2025-05-07T19:50:50.6039502Z Written: gen_embedding_backward_partial_rowwise_lamb_split_weighted_cuda.cu 2025-05-07T19:50:50.6040132Z Written: gen_embedding_backward_partial_rowwise_lamb_split_unweighted_nobag_cuda.cu 2025-05-07T19:50:50.6040746Z Written: gen_embedding_backward_partial_rowwise_lamb_split_unweighted_cuda.cu 2025-05-07T19:50:50.6041347Z Written: gen_embedding_backward_partial_rowwise_lamb_split_weighted_meta.cpp 2025-05-07T19:50:50.6041965Z Written: gen_embedding_backward_partial_rowwise_lamb_split_unweighted_nobag_meta.cpp 2025-05-07T19:50:50.6042583Z Written: gen_embedding_backward_partial_rowwise_lamb_split_unweighted_meta.cpp 2025-05-07T19:50:50.6043207Z Written: gen_embedding_backward_partial_rowwise_lamb_split_weighted_kernel_cta.cu 2025-05-07T19:50:50.6043857Z Written: gen_embedding_backward_partial_rowwise_lamb_split_unweighted_nobag_kernel_cta.cu 2025-05-07T19:50:50.6044531Z Written: gen_embedding_backward_partial_rowwise_lamb_split_unweighted_kernel_cta.cu 2025-05-07T19:50:50.6045175Z Written: gen_embedding_backward_partial_rowwise_lamb_split_weighted_kernel_warp.cu 2025-05-07T19:50:50.6045931Z Written: gen_embedding_backward_partial_rowwise_lamb_split_unweighted_nobag_kernel_warp.cu 2025-05-07T19:50:50.6046759Z Written: gen_embedding_backward_partial_rowwise_lamb_split_unweighted_kernel_warp.cu 2025-05-07T19:50:50.6047327Z Written: gen_embedding_optimizer_partial_rowwise_lamb_split_device_kernel.cuh 2025-05-07T19:50:50.6047835Z Written: gen_embedding_backward_split_partial_rowwise_lamb.cpp 2025-05-07T19:50:50.6048285Z Written: gen_embedding_split_partial_rowwise_lamb_pt2_autograd.cpp 2025-05-07T19:50:50.6048814Z Written: gen_embedding_backward_split_partial_rowwise_lamb_pt2_cuda_wrapper.cpp 2025-05-07T19:50:50.6049277Z Written: lookup_partial_rowwise_lamb.py 2025-05-07T19:50:50.6049656Z Written: gen_embedding_backward_split_partial_rowwise_lamb_cpu.cpp 2025-05-07T19:50:50.6050176Z Written: gen_embedding_backward_split_partial_rowwise_lamb_pt2_cpu_wrapper.cpp 2025-05-07T19:50:50.6050705Z Written: gen_embedding_backward_rowwise_adagrad_ssd_weighted_vbe_cuda.cu 2025-05-07T19:50:50.6051230Z Written: gen_embedding_backward_rowwise_adagrad_split_weighted_vbe_cuda.cu 2025-05-07T19:50:50.6051752Z Written: gen_embedding_backward_rowwise_adagrad_ssd_weighted_cuda.cu 2025-05-07T19:50:50.6052236Z Written: gen_embedding_backward_rowwise_adagrad_split_weighted_cuda.cu 2025-05-07T19:50:50.6052768Z Written: gen_embedding_backward_rowwise_adagrad_ssd_unweighted_nobag_cuda.cu 2025-05-07T19:50:50.6053307Z Written: gen_embedding_backward_rowwise_adagrad_split_unweighted_nobag_cuda.cu 2025-05-07T19:50:50.6054017Z Written: gen_embedding_backward_rowwise_adagrad_ssd_unweighted_vbe_cuda.cu 2025-05-07T19:50:50.6054548Z Written: gen_embedding_backward_rowwise_adagrad_split_unweighted_vbe_cuda.cu 2025-05-07T19:50:50.6055085Z Written: gen_embedding_backward_rowwise_adagrad_ssd_unweighted_cuda.cu 2025-05-07T19:50:50.6055606Z Written: gen_embedding_backward_rowwise_adagrad_split_unweighted_cuda.cu 2025-05-07T19:50:50.6056119Z Written: gen_embedding_backward_rowwise_adagrad_ssd_weighted_vbe_meta.cpp 2025-05-07T19:50:50.6056657Z Written: gen_embedding_backward_rowwise_adagrad_split_weighted_vbe_meta.cpp 2025-05-07T19:50:50.6057167Z Written: gen_embedding_backward_rowwise_adagrad_ssd_weighted_meta.cpp 2025-05-07T19:50:50.6057687Z Written: gen_embedding_backward_rowwise_adagrad_split_weighted_meta.cpp 2025-05-07T19:50:50.6058210Z Written: gen_embedding_backward_rowwise_adagrad_ssd_unweighted_nobag_meta.cpp 2025-05-07T19:50:50.6058784Z Written: gen_embedding_backward_rowwise_adagrad_split_unweighted_nobag_meta.cpp 2025-05-07T19:50:50.6059354Z Written: gen_embedding_backward_rowwise_adagrad_ssd_unweighted_vbe_meta.cpp 2025-05-07T19:50:50.6059999Z Written: gen_embedding_backward_rowwise_adagrad_split_unweighted_vbe_meta.cpp 2025-05-07T19:50:50.6060546Z Written: gen_embedding_backward_rowwise_adagrad_ssd_unweighted_meta.cpp 2025-05-07T19:50:50.6061059Z Written: gen_embedding_backward_rowwise_adagrad_split_unweighted_meta.cpp 2025-05-07T19:50:50.6061621Z Written: gen_embedding_backward_rowwise_adagrad_ssd_weighted_vbe_kernel_cta.cu 2025-05-07T19:50:50.6062195Z Written: gen_embedding_backward_rowwise_adagrad_split_weighted_vbe_kernel_cta.cu 2025-05-07T19:50:50.6062738Z Written: gen_embedding_backward_rowwise_adagrad_ssd_weighted_kernel_cta.cu 2025-05-07T19:50:50.6063287Z Written: gen_embedding_backward_rowwise_adagrad_split_weighted_kernel_cta.cu 2025-05-07T19:50:50.6064164Z Written: gen_embedding_backward_rowwise_adagrad_ssd_unweighted_nobag_kernel_cta.cu 2025-05-07T19:50:50.6064847Z Written: gen_embedding_backward_rowwise_adagrad_split_unweighted_nobag_kernel_cta.cu 2025-05-07T19:50:50.6065484Z Written: gen_embedding_backward_rowwise_adagrad_ssd_unweighted_vbe_kernel_cta.cu 2025-05-07T19:50:50.6066112Z Written: gen_embedding_backward_rowwise_adagrad_split_unweighted_vbe_kernel_cta.cu 2025-05-07T19:50:50.6066728Z Written: gen_embedding_backward_rowwise_adagrad_ssd_unweighted_kernel_cta.cu 2025-05-07T19:50:50.6067308Z Written: gen_embedding_backward_rowwise_adagrad_split_unweighted_kernel_cta.cu 2025-05-07T19:50:50.6067915Z Written: gen_embedding_backward_rowwise_adagrad_ssd_weighted_vbe_kernel_warp.cu 2025-05-07T19:50:50.6068518Z Written: gen_embedding_backward_rowwise_adagrad_split_weighted_vbe_kernel_warp.cu 2025-05-07T19:50:50.6069120Z Written: gen_embedding_backward_rowwise_adagrad_ssd_weighted_kernel_warp.cu 2025-05-07T19:50:50.6069691Z Written: gen_embedding_backward_rowwise_adagrad_split_weighted_kernel_warp.cu 2025-05-07T19:50:50.6070408Z Written: gen_embedding_backward_rowwise_adagrad_ssd_unweighted_nobag_kernel_warp.cu 2025-05-07T19:50:50.6071009Z Written: gen_embedding_backward_rowwise_adagrad_split_unweighted_nobag_kernel_warp.cu 2025-05-07T19:50:50.6071593Z Written: gen_embedding_backward_rowwise_adagrad_ssd_unweighted_vbe_kernel_warp.cu 2025-05-07T19:50:50.6072205Z Written: gen_embedding_backward_rowwise_adagrad_split_unweighted_vbe_kernel_warp.cu 2025-05-07T19:50:50.6072778Z Written: gen_embedding_backward_rowwise_adagrad_ssd_unweighted_kernel_warp.cu 2025-05-07T19:50:50.6073321Z Written: gen_embedding_backward_rowwise_adagrad_split_unweighted_kernel_warp.cu 2025-05-07T19:50:50.6073907Z Written: gen_embedding_backward_rowwise_adagrad_split_weighted_vbe_gwd_kernel_cta.cu 2025-05-07T19:50:50.6074483Z Written: gen_embedding_backward_rowwise_adagrad_split_weighted_gwd_kernel_cta.cu 2025-05-07T19:50:50.6075086Z Written: gen_embedding_backward_rowwise_adagrad_split_unweighted_vbe_gwd_kernel_cta.cu 2025-05-07T19:50:50.6075676Z Written: gen_embedding_backward_rowwise_adagrad_split_unweighted_gwd_kernel_cta.cu 2025-05-07T19:50:50.6076351Z Written: gen_embedding_backward_rowwise_adagrad_split_weighted_vbe_gwd_kernel_warp.cu 2025-05-07T19:50:50.6076952Z Written: gen_embedding_backward_rowwise_adagrad_split_weighted_gwd_kernel_warp.cu 2025-05-07T19:50:50.6077544Z Written: gen_embedding_backward_rowwise_adagrad_split_unweighted_vbe_gwd_kernel_warp.cu 2025-05-07T19:50:50.6078159Z Written: gen_embedding_backward_rowwise_adagrad_split_unweighted_gwd_kernel_warp.cu 2025-05-07T19:50:50.6078723Z Written: gen_embedding_backward_rowwise_adagrad_split_weighted_vbe_gwd_cuda.cu 2025-05-07T19:50:50.6079265Z Written: gen_embedding_backward_rowwise_adagrad_split_weighted_gwd_cuda.cu 2025-05-07T19:50:50.6079817Z Written: gen_embedding_backward_rowwise_adagrad_split_unweighted_vbe_gwd_cuda.cu 2025-05-07T19:50:50.6080360Z Written: gen_embedding_backward_rowwise_adagrad_split_unweighted_gwd_cuda.cu 2025-05-07T19:50:50.6080890Z Written: gen_embedding_optimizer_rowwise_adagrad_ssd_device_kernel.cuh 2025-05-07T19:50:50.6081393Z Written: gen_embedding_optimizer_rowwise_adagrad_split_device_kernel.cuh 2025-05-07T19:50:50.6081865Z Written: gen_embedding_backward_ssd_rowwise_adagrad.cpp 2025-05-07T19:50:50.6082325Z Written: gen_embedding_ssd_rowwise_adagrad_pt2_autograd.cpp 2025-05-07T19:50:50.6082805Z Written: gen_embedding_backward_ssd_rowwise_adagrad_pt2_cuda_wrapper.cpp 2025-05-07T19:50:50.6083236Z Written: lookup_rowwise_adagrad_ssd.py 2025-05-07T19:50:50.6083584Z Written: gen_embedding_backward_split_rowwise_adagrad.cpp 2025-05-07T19:50:50.6084015Z Written: gen_embedding_split_rowwise_adagrad_pt2_autograd.cpp 2025-05-07T19:50:50.6084489Z Written: gen_embedding_backward_split_rowwise_adagrad_pt2_cuda_wrapper.cpp 2025-05-07T19:50:50.6084919Z Written: lookup_rowwise_adagrad.py 2025-05-07T19:50:50.6085266Z Written: gen_embedding_backward_rowwise_adagrad_split_cpu.cpp 2025-05-07T19:50:50.6085711Z Written: gen_embedding_backward_split_rowwise_adagrad_cpu.cpp 2025-05-07T19:50:50.6086184Z Written: gen_embedding_backward_split_rowwise_adagrad_pt2_cpu_wrapper.cpp 2025-05-07T19:50:50.6086740Z Written: gen_embedding_optimizer_approx_rowwise_adagrad_split_device_kernel.cuh 2025-05-07T19:50:50.6087265Z Written: gen_embedding_backward_split_approx_rowwise_adagrad.cpp 2025-05-07T19:50:50.6087731Z Written: gen_embedding_split_approx_rowwise_adagrad_pt2_autograd.cpp 2025-05-07T19:50:50.6088268Z Written: gen_embedding_backward_split_approx_rowwise_adagrad_pt2_cuda_wrapper.cpp 2025-05-07T19:50:50.6088794Z Written: gen_embedding_backward_split_approx_rowwise_adagrad_cpu.cpp 2025-05-07T19:50:50.6089325Z Written: gen_embedding_backward_split_approx_rowwise_adagrad_pt2_cpu_wrapper.cpp 2025-05-07T19:50:50.6089940Z Written: gen_embedding_optimizer_rowwise_adagrad_with_weight_decay_split_device_kernel.cuh 2025-05-07T19:50:50.6090521Z Written: gen_embedding_backward_split_rowwise_adagrad_with_weight_decay.cpp 2025-05-07T19:50:50.6091076Z Written: gen_embedding_split_rowwise_adagrad_with_weight_decay_pt2_autograd.cpp 2025-05-07T19:50:50.6091670Z Written: gen_embedding_backward_split_rowwise_adagrad_with_weight_decay_pt2_cuda_wrapper.cpp 2025-05-07T19:50:50.6092292Z Written: gen_embedding_backward_split_rowwise_adagrad_with_weight_decay_cpu.cpp 2025-05-07T19:50:50.6092885Z Written: gen_embedding_backward_split_rowwise_adagrad_with_weight_decay_pt2_cpu_wrapper.cpp 2025-05-07T19:50:50.6093562Z Written: gen_embedding_optimizer_approx_rowwise_adagrad_with_weight_decay_split_device_kernel.cuh 2025-05-07T19:50:50.6094206Z Written: gen_embedding_backward_split_approx_rowwise_adagrad_with_weight_decay.cpp 2025-05-07T19:50:50.6094792Z Written: gen_embedding_split_approx_rowwise_adagrad_with_weight_decay_pt2_autograd.cpp 2025-05-07T19:50:50.6095456Z Written: gen_embedding_backward_split_approx_rowwise_adagrad_with_weight_decay_pt2_cuda_wrapper.cpp 2025-05-07T19:50:50.7120577Z Written: gen_embedding_backward_split_approx_rowwise_adagrad_with_weight_decay_cpu.cpp 2025-05-07T19:50:50.7122821Z Written: gen_embedding_backward_split_approx_rowwise_adagrad_with_weight_decay_pt2_cpu_wrapper.cpp 2025-05-07T19:50:50.7125416Z Written: gen_embedding_backward_rowwise_adagrad_with_counter_split_weighted_vbe_cuda.cu 2025-05-07T19:50:50.7126064Z Written: gen_embedding_backward_rowwise_adagrad_with_counter_split_weighted_cuda.cu 2025-05-07T19:50:50.7126842Z Written: gen_embedding_backward_rowwise_adagrad_with_counter_split_unweighted_nobag_cuda.cu 2025-05-07T19:50:50.7127634Z Written: gen_embedding_backward_rowwise_adagrad_with_counter_split_unweighted_vbe_cuda.cu 2025-05-07T19:50:50.7128247Z Written: gen_embedding_backward_rowwise_adagrad_with_counter_split_unweighted_cuda.cu 2025-05-07T19:50:50.7128872Z Written: gen_embedding_backward_rowwise_adagrad_with_counter_split_weighted_vbe_meta.cpp 2025-05-07T19:50:50.7129479Z Written: gen_embedding_backward_rowwise_adagrad_with_counter_split_weighted_meta.cpp 2025-05-07T19:50:50.7130119Z Written: gen_embedding_backward_rowwise_adagrad_with_counter_split_unweighted_nobag_meta.cpp 2025-05-07T19:50:50.7130780Z Written: gen_embedding_backward_rowwise_adagrad_with_counter_split_unweighted_vbe_meta.cpp 2025-05-07T19:50:50.7131398Z Written: gen_embedding_backward_rowwise_adagrad_with_counter_split_unweighted_meta.cpp 2025-05-07T19:50:50.7132157Z Written: gen_embedding_backward_rowwise_adagrad_with_counter_split_weighted_vbe_kernel_cta.cu 2025-05-07T19:50:50.7132805Z Written: gen_embedding_backward_rowwise_adagrad_with_counter_split_weighted_kernel_cta.cu 2025-05-07T19:50:50.7133474Z Written: gen_embedding_backward_rowwise_adagrad_with_counter_split_unweighted_nobag_kernel_cta.cu 2025-05-07T19:50:50.7134169Z Written: gen_embedding_backward_rowwise_adagrad_with_counter_split_unweighted_vbe_kernel_cta.cu 2025-05-07T19:50:50.7134824Z Written: gen_embedding_backward_rowwise_adagrad_with_counter_split_unweighted_kernel_cta.cu 2025-05-07T19:50:50.7135498Z Written: gen_embedding_backward_rowwise_adagrad_with_counter_split_weighted_vbe_kernel_warp.cu 2025-05-07T19:50:50.7136146Z Written: gen_embedding_backward_rowwise_adagrad_with_counter_split_weighted_kernel_warp.cu 2025-05-07T19:50:50.7136836Z Written: gen_embedding_backward_rowwise_adagrad_with_counter_split_unweighted_nobag_kernel_warp.cu 2025-05-07T19:50:50.7137538Z Written: gen_embedding_backward_rowwise_adagrad_with_counter_split_unweighted_vbe_kernel_warp.cu 2025-05-07T19:50:50.7138198Z Written: gen_embedding_backward_rowwise_adagrad_with_counter_split_unweighted_kernel_warp.cu 2025-05-07T19:50:50.7138836Z Written: gen_embedding_optimizer_rowwise_adagrad_with_counter_split_device_kernel.cuh 2025-05-07T19:50:50.7139390Z Written: gen_embedding_backward_split_rowwise_adagrad_with_counter.cpp 2025-05-07T19:50:50.7139911Z Written: gen_embedding_split_rowwise_adagrad_with_counter_pt2_autograd.cpp 2025-05-07T19:50:50.7140477Z Written: gen_embedding_backward_split_rowwise_adagrad_with_counter_pt2_cuda_wrapper.cpp 2025-05-07T19:50:50.7140992Z Written: lookup_rowwise_adagrad_with_counter.py 2025-05-07T19:50:50.7141443Z Written: gen_embedding_backward_split_rowwise_adagrad_with_counter_cpu.cpp 2025-05-07T19:50:50.7142012Z Written: gen_embedding_backward_split_rowwise_adagrad_with_counter_pt2_cpu_wrapper.cpp 2025-05-07T19:50:50.7142670Z Written: gen_embedding_optimizer_approx_rowwise_adagrad_with_counter_split_device_kernel.cuh 2025-05-07T19:50:50.7143269Z Written: gen_embedding_backward_split_approx_rowwise_adagrad_with_counter.cpp 2025-05-07T19:50:50.7144179Z Written: gen_embedding_split_approx_rowwise_adagrad_with_counter_pt2_autograd.cpp 2025-05-07T19:50:50.7144916Z Written: gen_embedding_backward_split_approx_rowwise_adagrad_with_counter_pt2_cuda_wrapper.cpp 2025-05-07T19:50:50.7145606Z Written: gen_embedding_backward_split_approx_rowwise_adagrad_with_counter_cpu.cpp 2025-05-07T19:50:50.7146294Z Written: gen_embedding_backward_split_approx_rowwise_adagrad_with_counter_pt2_cpu_wrapper.cpp 2025-05-07T19:50:50.7146958Z Written: gen_embedding_optimizer_rowwise_weighted_adagrad_split_device_kernel.cuh 2025-05-07T19:50:50.7147548Z Written: gen_embedding_backward_split_rowwise_weighted_adagrad.cpp 2025-05-07T19:50:50.7148068Z Written: gen_embedding_split_rowwise_weighted_adagrad_pt2_autograd.cpp 2025-05-07T19:50:50.7148796Z Written: gen_embedding_backward_split_rowwise_weighted_adagrad_pt2_cuda_wrapper.cpp 2025-05-07T19:50:50.7149406Z Written: gen_embedding_backward_split_rowwise_weighted_adagrad_cpu.cpp 2025-05-07T19:50:50.7149988Z Written: gen_embedding_backward_split_rowwise_weighted_adagrad_pt2_cpu_wrapper.cpp 2025-05-07T19:50:50.7150654Z Written: gen_embedding_backward_sgd_split_weighted_vbe_cuda.cu 2025-05-07T19:50:50.7151081Z Written: gen_embedding_backward_sgd_split_weighted_cuda.cu 2025-05-07T19:50:50.7151534Z Written: gen_embedding_backward_sgd_split_unweighted_nobag_cuda.cu 2025-05-07T19:50:50.7151982Z Written: gen_embedding_backward_sgd_split_unweighted_vbe_cuda.cu 2025-05-07T19:50:50.7152429Z Written: gen_embedding_backward_sgd_split_unweighted_cuda.cu 2025-05-07T19:50:50.7152871Z Written: gen_embedding_backward_sgd_split_weighted_vbe_meta.cpp 2025-05-07T19:50:50.7153297Z Written: gen_embedding_backward_sgd_split_weighted_meta.cpp 2025-05-07T19:50:50.7153760Z Written: gen_embedding_backward_sgd_split_unweighted_nobag_meta.cpp 2025-05-07T19:50:50.7154285Z Written: gen_embedding_backward_sgd_split_unweighted_vbe_meta.cpp 2025-05-07T19:50:50.7154742Z Written: gen_embedding_backward_sgd_split_unweighted_meta.cpp 2025-05-07T19:50:50.7155190Z Written: gen_embedding_backward_sgd_split_weighted_vbe_kernel_cta.cu 2025-05-07T19:50:50.7155666Z Written: gen_embedding_backward_sgd_split_weighted_kernel_cta.cu 2025-05-07T19:50:50.7156155Z Written: gen_embedding_backward_sgd_split_unweighted_nobag_kernel_cta.cu 2025-05-07T19:50:50.7156657Z Written: gen_embedding_backward_sgd_split_unweighted_vbe_kernel_cta.cu 2025-05-07T19:50:50.7157155Z Written: gen_embedding_backward_sgd_split_unweighted_kernel_cta.cu 2025-05-07T19:50:50.7157628Z Written: gen_embedding_backward_sgd_split_weighted_vbe_kernel_warp.cu 2025-05-07T19:50:50.7158113Z Written: gen_embedding_backward_sgd_split_weighted_kernel_warp.cu 2025-05-07T19:50:50.7158602Z Written: gen_embedding_backward_sgd_split_unweighted_nobag_kernel_warp.cu 2025-05-07T19:50:50.7159135Z Written: gen_embedding_backward_sgd_split_unweighted_vbe_kernel_warp.cu 2025-05-07T19:50:50.7159638Z Written: gen_embedding_backward_sgd_split_unweighted_kernel_warp.cu 2025-05-07T19:50:50.7160085Z Written: gen_embedding_optimizer_sgd_split_device_kernel.cuh 2025-05-07T19:50:50.7160482Z Written: gen_embedding_backward_split_sgd.cpp 2025-05-07T19:50:50.7160822Z Written: gen_embedding_split_sgd_pt2_autograd.cpp 2025-05-07T19:50:50.7161231Z Written: gen_embedding_backward_split_sgd_pt2_cuda_wrapper.cpp 2025-05-07T19:50:50.7161591Z Written: lookup_sgd.py 2025-05-07T19:50:50.7161880Z Written: gen_embedding_backward_sgd_split_cpu.cpp 2025-05-07T19:50:50.7162229Z Written: gen_embedding_backward_split_sgd_cpu.cpp 2025-05-07T19:50:50.7162634Z Written: gen_embedding_backward_split_sgd_pt2_cpu_wrapper.cpp 2025-05-07T19:50:50.7163104Z Written: gen_embedding_optimizer_approx_sgd_split_device_kernel.cuh 2025-05-07T19:50:50.7163529Z Written: gen_embedding_backward_split_approx_sgd.cpp 2025-05-07T19:50:50.7163933Z Written: gen_embedding_split_approx_sgd_pt2_autograd.cpp 2025-05-07T19:50:50.7164379Z Written: gen_embedding_backward_split_approx_sgd_pt2_cuda_wrapper.cpp 2025-05-07T19:50:50.7164838Z Written: gen_embedding_backward_split_approx_sgd_cpu.cpp 2025-05-07T19:50:50.7165270Z Written: gen_embedding_backward_split_approx_sgd_pt2_cpu_wrapper.cpp 2025-05-07T19:50:50.7165736Z Written: gen_embedding_backward_none_split_weighted_cuda.cu 2025-05-07T19:50:50.7166192Z Written: gen_embedding_backward_none_split_unweighted_nobag_cuda.cu 2025-05-07T19:50:50.7166641Z Written: gen_embedding_backward_none_split_unweighted_cuda.cu 2025-05-07T19:50:50.7167083Z Written: gen_embedding_backward_none_split_weighted_meta.cpp 2025-05-07T19:50:50.7167531Z Written: gen_embedding_backward_none_split_unweighted_nobag_meta.cpp 2025-05-07T19:50:50.7168001Z Written: gen_embedding_backward_none_split_unweighted_meta.cpp 2025-05-07T19:50:50.7168446Z Written: gen_embedding_backward_none_split_weighted_kernel_cta.cu 2025-05-07T19:50:50.7169019Z Written: gen_embedding_backward_none_split_unweighted_nobag_kernel_cta.cu 2025-05-07T19:50:50.7169532Z Written: gen_embedding_backward_none_split_unweighted_kernel_cta.cu 2025-05-07T19:50:50.7169994Z Written: gen_embedding_backward_none_split_weighted_kernel_warp.cu 2025-05-07T19:50:50.7170504Z Written: gen_embedding_backward_none_split_unweighted_nobag_kernel_warp.cu 2025-05-07T19:50:50.7171006Z Written: gen_embedding_backward_none_split_unweighted_kernel_warp.cu 2025-05-07T19:50:50.7171481Z Written: gen_embedding_optimizer_none_split_device_kernel.cuh 2025-05-07T19:50:50.7171873Z Written: gen_embedding_backward_split_none.cpp 2025-05-07T19:50:50.7172237Z Written: gen_embedding_split_none_pt2_autograd.cpp 2025-05-07T19:50:50.7172655Z Written: gen_embedding_backward_split_none_pt2_cuda_wrapper.cpp 2025-05-07T19:50:50.7173017Z Written: lookup_none.py 2025-05-07T19:50:50.7173307Z Written: gen_embedding_backward_split_none_cpu.cpp 2025-05-07T19:50:50.7173703Z Written: gen_embedding_backward_split_none_pt2_cpu_wrapper.cpp 2025-05-07T19:50:50.7174234Z Written: gen_embedding_backward_split_weighted_device_kernel_hip.hip 2025-05-07T19:50:50.7174739Z Written: gen_embedding_backward_split_unweighted_nobag_device_kernel_hip.hip 2025-05-07T19:50:50.7175268Z Written: gen_embedding_backward_split_unweighted_device_kernel_hip.hip 2025-05-07T19:50:50.7175756Z Written: gen_embedding_backward_ssd_weighted_vbe_device_kernel.cuh 2025-05-07T19:50:50.7176220Z Written: gen_embedding_backward_split_weighted_vbe_device_kernel.cuh 2025-05-07T19:50:50.7176687Z Written: gen_embedding_backward_ssd_weighted_device_kernel.cuh 2025-05-07T19:50:50.7177124Z Written: gen_embedding_backward_split_weighted_device_kernel.cuh 2025-05-07T19:50:50.7177611Z Written: gen_embedding_backward_ssd_unweighted_nobag_device_kernel.cuh 2025-05-07T19:50:50.7178107Z Written: gen_embedding_backward_split_unweighted_nobag_device_kernel.cuh 2025-05-07T19:50:50.7178618Z Written: gen_embedding_backward_ssd_unweighted_vbe_device_kernel.cuh 2025-05-07T19:50:50.7179119Z Written: gen_embedding_backward_split_unweighted_vbe_device_kernel.cuh 2025-05-07T19:50:50.7179590Z Written: gen_embedding_backward_ssd_unweighted_device_kernel.cuh 2025-05-07T19:50:50.7180060Z Written: gen_embedding_backward_split_unweighted_device_kernel.cuh 2025-05-07T19:50:50.7180507Z Written: gen_embedding_backward_split_common_device_kernel.cuh 2025-05-07T19:50:50.7180956Z Written: gen_embedding_backward_split_grad_embedding_ops.cu 2025-05-07T19:50:50.7181400Z Written: gen_embedding_backward_dense_indice_weights_codegen_cuda.cu 2025-05-07T19:50:50.7181879Z Written: gen_embedding_backward_ssd_indice_weights_codegen_cuda.cu 2025-05-07T19:50:50.7182357Z Written: gen_embedding_backward_split_indice_weights_codegen_cuda.cu 2025-05-07T19:50:50.7182732Z Written: pt2_arg_utils.h 2025-05-07T19:50:50.7182979Z Written: __init__.py 2025-05-07T19:50:50.7183205Z Written: lookup_args_ssd.py 2025-05-07T19:50:50.7183565Z Written: lookup_args.py 2025-05-07T19:50:50.7251463Z 2025-05-07T19:50:50.7251481Z 2025-05-07T19:50:50.7252075Z ================================================================================ 2025-05-07T19:50:50.7253254Z Running code generation script ... 2025-05-07T19:50:50.7255612Z /github/home/miniconda/envs/build_binary/bin/python /__w/FBGEMM/FBGEMM/fbgemm_gpu/codegen/genscript/generate_embedding_optimizer.py --opensource 2025-05-07T19:50:50.7257158Z ================================================================================ 2025-05-07T19:50:50.7257429Z 2025-05-07T19:50:50.8325729Z [ARGS PARSE] Parsed arguments: Namespace(install_dir='.', is_fbcode=False, is_rocm=False) 2025-05-07T19:50:50.8327667Z [GENERATE OPTIMIZERS]: ['/__w/FBGEMM/FBGEMM/fbgemm_gpu/codegen/genscript/generate_embedding_optimizer.py', '--opensource'] 2025-05-07T19:50:50.8328440Z Written: gen_embedding_optimizer_rowwise_adagrad_split_cuda.cu 2025-05-07T19:50:50.8328934Z Written: gen_embedding_optimizer_rowwise_adagrad_split_kernel.cu 2025-05-07T19:50:50.8329832Z Written: gen_embedding_optimizer_rowwise_adagrad_split.cpp 2025-05-07T19:50:50.8330321Z Written: gen_embedding_optimizer_rowwise_adagrad_split_device_kernel.cuh 2025-05-07T19:50:50.8330816Z Written: split_embedding_optimizer_rowwise_adagrad.py 2025-05-07T19:50:50.8331199Z Written: optimizer_args.py 2025-05-07T19:50:50.8425198Z 2025-05-07T19:50:50.8425417Z 2025-05-07T19:50:50.8425789Z ================================================================================ 2025-05-07T19:50:50.8426226Z Running code generation script ... 2025-05-07T19:50:50.8427012Z /github/home/miniconda/envs/build_binary/bin/python /__w/FBGEMM/FBGEMM/fbgemm_gpu/codegen/genscript/generate_forward_quantized.py --opensource 2025-05-07T19:50:50.8427834Z ================================================================================ 2025-05-07T19:50:50.8428190Z 2025-05-07T19:50:50.9627480Z [ARGS PARSE] Parsed arguments: Namespace(install_dir='.', is_fbcode=False, is_rocm=False) 2025-05-07T19:50:50.9628401Z [GENERATE FORWARD QUANTIZED]: ['/__w/FBGEMM/FBGEMM/fbgemm_gpu/codegen/genscript/generate_forward_quantized.py', '--opensource'] 2025-05-07T19:50:50.9629561Z Written: gen_embedding_forward_quantized_split_nbit_kernel_weighted_fp32_codegen_cuda.cu 2025-05-07T19:50:50.9630260Z Written: gen_embedding_forward_quantized_split_nbit_kernel_weighted_fp16_codegen_cuda.cu 2025-05-07T19:50:50.9631177Z Written: gen_embedding_forward_quantized_split_nbit_kernel_weighted_fp8_codegen_cuda.cu 2025-05-07T19:50:50.9631800Z Written: gen_embedding_forward_quantized_split_nbit_kernel_weighted_int8_codegen_cuda.cu 2025-05-07T19:50:50.9632446Z Written: gen_embedding_forward_quantized_split_nbit_kernel_weighted_int4_codegen_cuda.cu 2025-05-07T19:50:50.9633088Z Written: gen_embedding_forward_quantized_split_nbit_kernel_weighted_int2_codegen_cuda.cu 2025-05-07T19:50:50.9633742Z Written: gen_embedding_forward_quantized_split_nbit_kernel_unweighted_nobag_fp32_codegen_cuda.cu 2025-05-07T19:50:50.9634452Z Written: gen_embedding_forward_quantized_split_nbit_kernel_unweighted_nobag_fp16_codegen_cuda.cu 2025-05-07T19:50:50.9635138Z Written: gen_embedding_forward_quantized_split_nbit_kernel_unweighted_nobag_fp8_codegen_cuda.cu 2025-05-07T19:50:50.9635840Z Written: gen_embedding_forward_quantized_split_nbit_kernel_unweighted_nobag_int8_codegen_cuda.cu 2025-05-07T19:50:50.9636545Z Written: gen_embedding_forward_quantized_split_nbit_kernel_unweighted_nobag_int4_codegen_cuda.cu 2025-05-07T19:50:50.9637227Z Written: gen_embedding_forward_quantized_split_nbit_kernel_unweighted_nobag_int2_codegen_cuda.cu 2025-05-07T19:50:50.9637908Z Written: gen_embedding_forward_quantized_split_nbit_kernel_unweighted_fp32_codegen_cuda.cu 2025-05-07T19:50:50.9638551Z Written: gen_embedding_forward_quantized_split_nbit_kernel_unweighted_fp16_codegen_cuda.cu 2025-05-07T19:50:50.9639215Z Written: gen_embedding_forward_quantized_split_nbit_kernel_unweighted_fp8_codegen_cuda.cu 2025-05-07T19:50:50.9639876Z Written: gen_embedding_forward_quantized_split_nbit_kernel_unweighted_int8_codegen_cuda.cu 2025-05-07T19:50:50.9640518Z Written: gen_embedding_forward_quantized_split_nbit_kernel_unweighted_int4_codegen_cuda.cu 2025-05-07T19:50:50.9641183Z Written: gen_embedding_forward_quantized_split_nbit_kernel_unweighted_int2_codegen_cuda.cu 2025-05-07T19:50:50.9641797Z Written: gen_embedding_forward_quantized_split_nbit_host_weighted_codegen_cuda.cu 2025-05-07T19:50:50.9642427Z Written: gen_embedding_forward_quantized_split_nbit_host_unweighted_nobag_codegen_cuda.cu 2025-05-07T19:50:50.9643043Z Written: gen_embedding_forward_quantized_split_nbit_host_unweighted_codegen_cuda.cu 2025-05-07T19:50:50.9643607Z Written: gen_embedding_forward_quantized_weighted_codegen_cpu.cpp 2025-05-07T19:50:50.9644108Z Written: gen_embedding_forward_quantized_unweighted_codegen_cpu.cpp 2025-05-07T19:50:50.9728954Z 2025-05-07T19:50:50.9729025Z 2025-05-07T19:50:50.9729304Z ================================================================================ 2025-05-07T19:50:50.9729743Z Running code generation script ... 2025-05-07T19:50:50.9731074Z /github/home/miniconda/envs/build_binary/bin/python /__w/FBGEMM/FBGEMM/fbgemm_gpu/codegen/genscript/generate_forward_split.py --opensource 2025-05-07T19:50:50.9731851Z ================================================================================ 2025-05-07T19:50:50.9732090Z 2025-05-07T19:50:51.3148434Z [ARGS PARSE] Parsed arguments: Namespace(install_dir='.', is_fbcode=False, is_rocm=False) 2025-05-07T19:50:51.3150953Z [GENERATE FORWARD SPLIT]: ['/__w/FBGEMM/FBGEMM/fbgemm_gpu/codegen/genscript/generate_forward_split.py', '--opensource'] 2025-05-07T19:50:51.3153063Z Written: gen_embedding_forward_dense_weighted_vbe_codegen_cuda.cu 2025-05-07T19:50:51.3154486Z Written: gen_embedding_forward_dense_weighted_codegen_cuda.cu 2025-05-07T19:50:51.3155921Z Written: gen_embedding_forward_dense_unweighted_vbe_codegen_cuda.cu 2025-05-07T19:50:51.3156421Z Written: gen_embedding_forward_dense_unweighted_codegen_cuda.cu 2025-05-07T19:50:51.3156901Z Written: gen_embedding_forward_ssd_weighted_vbe_codegen_cuda.cu 2025-05-07T19:50:51.3157375Z Written: gen_embedding_forward_split_weighted_vbe_codegen_cuda.cu 2025-05-07T19:50:51.3158084Z Written: gen_embedding_forward_ssd_weighted_codegen_cuda.cu 2025-05-07T19:50:51.3158542Z Written: gen_embedding_forward_split_weighted_codegen_cuda.cu 2025-05-07T19:50:51.3159029Z Written: gen_embedding_forward_ssd_unweighted_vbe_codegen_cuda.cu 2025-05-07T19:50:51.3159505Z Written: gen_embedding_forward_split_unweighted_vbe_codegen_cuda.cu 2025-05-07T19:50:51.3159991Z Written: gen_embedding_forward_ssd_unweighted_codegen_cuda.cu 2025-05-07T19:50:51.3160462Z Written: gen_embedding_forward_split_unweighted_codegen_cuda.cu 2025-05-07T19:50:51.3160939Z Written: gen_embedding_forward_split_weighted_vbe_gwd_codegen_cuda.cu 2025-05-07T19:50:51.3161443Z Written: gen_embedding_forward_split_weighted_gwd_codegen_cuda.cu 2025-05-07T19:50:51.3161931Z Written: gen_embedding_forward_split_unweighted_vbe_gwd_codegen_cuda.cu 2025-05-07T19:50:51.3162446Z Written: gen_embedding_forward_split_unweighted_gwd_codegen_cuda.cu 2025-05-07T19:50:51.3162929Z Written: gen_embedding_forward_dense_weighted_vbe_codegen_meta.cpp 2025-05-07T19:50:51.3163425Z Written: gen_embedding_forward_dense_weighted_codegen_meta.cpp 2025-05-07T19:50:51.3163919Z Written: gen_embedding_forward_dense_unweighted_vbe_codegen_meta.cpp 2025-05-07T19:50:51.3164392Z Written: gen_embedding_forward_dense_unweighted_codegen_meta.cpp 2025-05-07T19:50:51.3164870Z Written: gen_embedding_forward_ssd_weighted_vbe_codegen_meta.cpp 2025-05-07T19:50:51.3165332Z Written: gen_embedding_forward_split_weighted_vbe_codegen_meta.cpp 2025-05-07T19:50:51.3165808Z Written: gen_embedding_forward_ssd_weighted_codegen_meta.cpp 2025-05-07T19:50:51.3166250Z Written: gen_embedding_forward_split_weighted_codegen_meta.cpp 2025-05-07T19:50:51.3166731Z Written: gen_embedding_forward_ssd_unweighted_vbe_codegen_meta.cpp 2025-05-07T19:50:51.3167238Z Written: gen_embedding_forward_split_unweighted_vbe_codegen_meta.cpp 2025-05-07T19:50:51.3167717Z Written: gen_embedding_forward_ssd_unweighted_codegen_meta.cpp 2025-05-07T19:50:51.3168194Z Written: gen_embedding_forward_split_unweighted_codegen_meta.cpp 2025-05-07T19:50:51.3168641Z Written: gen_embedding_forward_dense_weighted_vbe_kernel.cu 2025-05-07T19:50:51.3169086Z Written: gen_embedding_forward_dense_weighted_kernel.cu 2025-05-07T19:50:51.3169514Z Written: gen_embedding_forward_dense_unweighted_nobag_kernel.cu 2025-05-07T19:50:51.3169984Z Written: gen_embedding_forward_dense_unweighted_vbe_kernel.cu 2025-05-07T19:50:51.3170432Z Written: gen_embedding_forward_dense_unweighted_kernel.cu 2025-05-07T19:50:51.3183634Z Written: gen_embedding_forward_ssd_weighted_vbe_kernel.cu 2025-05-07T19:50:51.3184385Z Written: gen_embedding_forward_split_weighted_vbe_kernel.cu 2025-05-07T19:50:51.3184909Z Written: gen_embedding_forward_ssd_weighted_kernel.cu 2025-05-07T19:50:51.3185340Z Written: gen_embedding_forward_split_weighted_kernel.cu 2025-05-07T19:50:51.3185782Z Written: gen_embedding_forward_ssd_unweighted_nobag_kernel.cu 2025-05-07T19:50:51.3186486Z Written: gen_embedding_forward_split_unweighted_nobag_kernel.cu 2025-05-07T19:50:51.3186976Z Written: gen_embedding_forward_ssd_unweighted_vbe_kernel.cu 2025-05-07T19:50:51.3187434Z Written: gen_embedding_forward_split_unweighted_vbe_kernel.cu 2025-05-07T19:50:51.3187902Z Written: gen_embedding_forward_ssd_unweighted_kernel.cu 2025-05-07T19:50:51.3188330Z Written: gen_embedding_forward_split_unweighted_kernel.cu 2025-05-07T19:50:51.3188796Z Written: gen_embedding_forward_split_weighted_vbe_gwd_kernel.cu 2025-05-07T19:50:51.3189255Z Written: gen_embedding_forward_split_weighted_gwd_kernel.cu 2025-05-07T19:50:51.3189740Z Written: gen_embedding_forward_split_unweighted_vbe_gwd_kernel.cu 2025-05-07T19:50:51.3190340Z Written: gen_embedding_forward_split_unweighted_gwd_kernel.cu 2025-05-07T19:50:51.3190754Z Written: gen_embedding_forward_split_weighted_v2_kernel.cu 2025-05-07T19:50:51.3191183Z Written: gen_embedding_forward_split_unweighted_v2_kernel.cu 2025-05-07T19:50:51.3191633Z Written: gen_embedding_forward_dense_unweighted_nobag_kernel_small.cu 2025-05-07T19:50:51.3192141Z Written: gen_embedding_forward_dense_unweighted_nobag_kernel_small.cu 2025-05-07T19:50:51.3192683Z Written: gen_embedding_forward_ssd_unweighted_nobag_kernel_small.cu 2025-05-07T19:50:51.3193180Z Written: gen_embedding_forward_split_unweighted_nobag_kernel_small.cu 2025-05-07T19:50:51.3193652Z Written: gen_embedding_forward_split_pt2_cuda_wrapper.cpp 2025-05-07T19:50:51.3194056Z Written: gen_embedding_forward_split_pt2_cpu_wrapper.cpp 2025-05-07T19:50:51.3194471Z Written: gen_embedding_forward_ssd_pt2_cuda_wrapper.cpp 2025-05-07T19:50:51.3266251Z 2025-05-07T19:50:51.3266269Z 2025-05-07T19:50:51.3266998Z ================================================================================ 2025-05-07T19:50:51.3267743Z Running code generation script ... 2025-05-07T19:50:51.3268629Z /github/home/miniconda/envs/build_binary/bin/python /__w/FBGEMM/FBGEMM/fbgemm_gpu/codegen/genscript/generate_index_select.py --opensource 2025-05-07T19:50:51.3269412Z ================================================================================ 2025-05-07T19:50:51.3269761Z 2025-05-07T19:50:51.5990705Z [ARGS PARSE] Parsed arguments: Namespace(install_dir='.', is_fbcode=False, is_rocm=False) 2025-05-07T19:50:51.5993147Z [INDEX SELECT GENERATOR]: ['/__w/FBGEMM/FBGEMM/fbgemm_gpu/codegen/genscript/generate_index_select.py', '--opensource'] 2025-05-07T19:50:51.5995174Z Written: gen_batch_index_select_dim0_forward_codegen_cuda.cu 2025-05-07T19:50:51.5996442Z Written: gen_batch_index_select_dim0_forward_kernel.cu 2025-05-07T19:50:51.5997688Z Written: gen_batch_index_select_dim0_forward_kernel_small.cu 2025-05-07T19:50:51.5998974Z Written: gen_batch_index_select_dim0_backward_codegen_cuda.cu 2025-05-07T19:50:51.5999752Z Written: gen_batch_index_select_dim0_backward_kernel_cta.cu 2025-05-07T19:50:51.6000166Z Written: gen_batch_index_select_dim0_backward_kernel_warp.cu 2025-05-07T19:50:51.6000646Z Written: gen_embedding_backward_split_batch_index_select_device_kernel.cuh 2025-05-07T19:50:51.6001135Z Written: gen_embedding_backward_split_grad_index_select.cu 2025-05-07T19:50:51.6001556Z Written: gen_embedding_backward_split_common_device_kernel.cuh 2025-05-07T19:50:51.6182909Z 2025-05-07T19:50:51.6183007Z 2025-05-07T19:50:51.6183738Z ================================================================================ 2025-05-07T19:50:51.6185231Z GPU CPP Library Target: fbgemm_gpu_experimental_gen_ai (SHARED) 2025-05-07T19:50:51.6186310Z 2025-05-07T19:50:51.6186863Z CPU_SRCS: 2025-05-07T19:50:51.6187940Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/attention/attention.cpp 2025-05-07T19:50:51.6189769Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/coalesce/coalesce.cpp 2025-05-07T19:50:51.6190375Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/quantize.cpp 2025-05-07T19:50:51.6190939Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/comm/car.cpp 2025-05-07T19:50:51.6191564Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/gather_scatter/gather_scatter.cpp 2025-05-07T19:50:51.6192477Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/moe/index_shuffling.cpp 2025-05-07T19:50:51.6193095Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/kv_cache/kv_cache.cpp 2025-05-07T19:50:51.6193544Z 2025-05-07T19:50:51.6193735Z GPU_SRCS: 2025-05-07T19:50:51.6194121Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/attention/gqa_attn_splitk.cu 2025-05-07T19:50:51.6194726Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/coalesce/coalesce.cu 2025-05-07T19:50:51.6195303Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/quantize.cu 2025-05-07T19:50:51.6195848Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/comm/car.cu 2025-05-07T19:50:51.6196437Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/gather_scatter/gather_scatter.cu 2025-05-07T19:50:51.6197087Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/moe/index_shuffling.cu 2025-05-07T19:50:51.6197666Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/kv_cache/kv_cache.cu 2025-05-07T19:50:51.6198107Z 2025-05-07T19:50:51.6198300Z CUDA_SPECIFIC_SRCS: 2025-05-07T19:50:51.6198924Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/bf16bf16bf16_grouped.cu 2025-05-07T19:50:51.6199730Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/bf16i4bf16.cu 2025-05-07T19:50:51.6200569Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/bf16i4bf16_rowwise_batched.cu 2025-05-07T19:50:51.6201483Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/bf16i4bf16_shuffled_grouped.cu 2025-05-07T19:50:51.6202289Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16.cu 2025-05-07T19:50:51.6203139Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_f.cu 2025-05-07T19:50:51.6204068Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_t.cu 2025-05-07T19:50:51.6205011Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_f.cu 2025-05-07T19:50:51.6205940Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_t.cu 2025-05-07T19:50:51.6206859Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_f.cu 2025-05-07T19:50:51.6207790Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_t.cu 2025-05-07T19:50:51.6208890Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_f.cu 2025-05-07T19:50:51.6209890Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_t.cu 2025-05-07T19:50:51.6210825Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_f.cu 2025-05-07T19:50:51.6211755Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_t.cu 2025-05-07T19:50:51.6212693Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_f.cu 2025-05-07T19:50:51.6213630Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_t.cu 2025-05-07T19:50:51.6214553Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_f.cu 2025-05-07T19:50:51.6215595Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_t.cu 2025-05-07T19:50:51.6216502Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_f.cu 2025-05-07T19:50:51.6217420Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_t.cu 2025-05-07T19:50:51.6218429Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_f.cu 2025-05-07T19:50:51.6219531Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_t.cu 2025-05-07T19:50:51.6220492Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_f.cu 2025-05-07T19:50:51.6221475Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_t.cu 2025-05-07T19:50:51.6222427Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_f.cu 2025-05-07T19:50:51.6223380Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_t.cu 2025-05-07T19:50:51.6224455Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_f.cu 2025-05-07T19:50:51.6225518Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_t.cu 2025-05-07T19:50:51.6226351Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16.cu 2025-05-07T19:50:51.6227160Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_blockwise.cu 2025-05-07T19:50:51.6228003Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_cublas.cu 2025-05-07T19:50:51.6228801Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_lite.cu 2025-05-07T19:50:51.6229628Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise.cu 2025-05-07T19:50:51.6230597Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_128_128_128_2_1_1_t_f.cu 2025-05-07T19:50:51.6231754Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_128_256_128_2_1_1_f_t.cu 2025-05-07T19:50:51.6232894Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_128_256_128_4_4_1_f_t.cu 2025-05-07T19:50:51.6234014Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_128_128_1_1_1_f_f.cu 2025-05-07T19:50:51.6235154Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_16_128_1_1_1_f_f.cu 2025-05-07T19:50:51.6236292Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_256_128_1_1_1_f_f.cu 2025-05-07T19:50:51.6237412Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_256_128_2_1_1_f_f.cu 2025-05-07T19:50:51.6238707Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_32_128_2_1_1_f_f.cu 2025-05-07T19:50:51.6239877Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_64_128_2_1_1_f_f.cu 2025-05-07T19:50:51.6241082Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/dispatch_fp8_rowwise_batched_kernel_on_cluster_size_and_transpose.cu 2025-05-07T19:50:51.6242380Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/dispatch_fp8_rowwise_batched_kernel_on_tile_size.cu 2025-05-07T19:50:51.6243502Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/f8f8bf16_rowwise_batched.cu 2025-05-07T19:50:51.6244563Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/f8f8bf16_rowwise_batched_impl.cu 2025-05-07T19:50:51.6245722Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/handle_transposition.cu 2025-05-07T19:50:51.6246620Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise_grouped.cu 2025-05-07T19:50:51.6247420Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_tensorwise.cu 2025-05-07T19:50:51.6248173Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8i4bf16_rowwise.cu 2025-05-07T19:50:51.6248933Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8i4bf16_shuffled.cu 2025-05-07T19:50:51.6249724Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8i4bf16_shuffled_grouped.cu 2025-05-07T19:50:51.6250458Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/i8i8bf16.cu 2025-05-07T19:50:51.6251179Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/i8i8bf16_dynamic.cu 2025-05-07T19:50:51.6251974Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/mixed_dtype_utils.cu 2025-05-07T19:50:51.6252680Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/fast_gemv/bf16_fast_gemv.cu 2025-05-07T19:50:51.6253358Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/fast_gemv/bf16fp8bf16_fast_gemv.cu 2025-05-07T19:50:51.6254046Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/fast_gemv/fp8fp8bf16_fast_gemv.cu 2025-05-07T19:50:51.6254751Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/fast_gemv/include/fast_gemv.cu 2025-05-07T19:50:51.6255421Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/fast_gemv/include/fast_gemv.cuh 2025-05-07T19:50:51.6256099Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/fast_gemv/include/utility.cuh 2025-05-07T19:50:51.6256576Z 2025-05-07T19:50:51.6256767Z HIP_SPECIFIC_SRCS: 2025-05-07T19:50:51.6257142Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/gemm/ck_extensions.hip 2025-05-07T19:50:51.6257677Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/gemm/gemm.cpp 2025-05-07T19:50:51.6258353Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/bf16_grouped_gemm.hip 2025-05-07T19:50:51.6259459Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_128x16x32x128_16x16_1x1_16x8x1_16x8x1_1x16x1x8_4x4x1_1x1_interwave_v2.hip 2025-05-07T19:50:51.6260846Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_128x16x32x64_16x16_1x1_8x16x1_8x16x1_1x16x1x8_2x2x1_1x1_intrawave_v2.hip 2025-05-07T19:50:51.6262208Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_128x16x32x64_16x16_1x1_8x16x1_8x16x1_1x16x1x8_4x4x1_1x1_intrawave_v2.hip 2025-05-07T19:50:51.6263651Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_128x16x64x128_16x16_1x2_16x8x1_16x8x1_1x16x1x8_8x8x1_1x2_interwave_v1.hip 2025-05-07T19:50:51.6265310Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_128x16x64x128_16x16_1x2_16x8x1_16x8x1_1x16x1x8_8x8x1_1x2_interwave_v2.hip 2025-05-07T19:50:51.6266794Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_128x16x64x128_16x16_1x2_16x8x1_16x8x1_1x16x1x8_8x8x1_1x2_intrawave_v2.hip 2025-05-07T19:50:51.6268254Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_128x16x96x128_16x16_1x3_16x8x1_16x8x1_1x16x1x8_4x4x1_1x1_interwave_v1.hip 2025-05-07T19:50:51.6269731Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_128x16x96x128_16x16_1x3_16x8x1_16x8x1_1x16x1x8_4x4x1_1x1_interwave_v2.hip 2025-05-07T19:50:51.6271270Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_128x16x96x128_16x16_1x3_16x8x1_16x8x1_1x16x1x8_4x4x1_1x1_intrawave_v1.hip 2025-05-07T19:50:51.6272725Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_128x16x96x128_16x16_1x3_16x8x1_16x8x1_1x16x1x8_4x4x1_1x1_intrawave_v2.hip 2025-05-07T19:50:51.6274184Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_128x16x96x64_16x16_1x3_8x16x1_8x16x1_1x16x1x8_4x4x1_1x1_intrawave_v1.hip 2025-05-07T19:50:51.6275626Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_128x32x16x64_16x16_1x1_8x16x1_8x16x1_1x16x1x8_2x2x1_1x1_interwave_v2.hip 2025-05-07T19:50:51.6277138Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_128x32x64x128_32x32_1x1_16x8x1_16x8x1_1x16x1x8_8x8x1_1x1_interwave_v2.hip 2025-05-07T19:50:51.6278556Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_128x32x64x128_32x32_1x1_16x8x1_16x8x1_1x16x1x8_8x8x1_1x1_intrawave_v1.hip 2025-05-07T19:50:51.6279900Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_128x32x96x128_16x16_2x3_16x8x1_16x8x1_1x32x1x4_8x8x1_2x1_intrawave_v2.hip 2025-05-07T19:50:51.6281240Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_128x64x128x64_32x32_2x2_8x16x1_8x16x1_1x16x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:50:51.6282586Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_128x64x96x64_16x16_4x3_8x16x1_8x16x1_1x32x1x4_8x8x1_2x1_intrawave_v3.hip 2025-05-07T19:50:51.6283937Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_256x128x128x128_32x32_2x2_16x16x1_16x16x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:50:51.6285317Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_256x128x128x64_32x32_2x2_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_interwave_v1.hip 2025-05-07T19:50:51.6286689Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_256x128x128x64_32x32_2x2_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:50:51.6288035Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_256x128x224x64_16x16_4x7_8x32x1_8x32x1_1x64x1x4_8x8x1_2x1_intrawave_v3.hip 2025-05-07T19:50:51.6289398Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_256x128x256x64_32x32_4x2_8x32x1_8x32x1_1x16x1x16_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:50:51.6290905Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_256x128x96x64_16x16_4x3_8x32x1_8x32x1_1x64x1x4_8x8x1_2x1_intrawave_v3.hip 2025-05-07T19:50:51.6292253Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_256x16x128x128_16x16_1x2_16x16x1_16x16x1_1x16x1x16_8x8x1_1x2_intrawave_v1.hip 2025-05-07T19:50:51.6293636Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_256x16x128x128_16x16_1x2_16x16x1_16x16x1_1x16x1x16_8x8x1_1x2_intrawave_v2.hip 2025-05-07T19:50:51.6295008Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_256x16x64x128_16x16_1x1_16x16x1_16x16x1_1x16x1x16_4x4x1_1x1_interwave_v2.hip 2025-05-07T19:50:51.6296360Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_256x224x256x32_16x16_7x8_8x32x1_8x32x1_1x32x1x8_8x8x1_1x2_intrawave_v3.hip 2025-05-07T19:50:51.6297781Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_256x256x128x32_32x32_4x2_4x64x1_4x64x1_1x32x1x8_8x8x1_1x1_interwave_v1.hip 2025-05-07T19:50:51.6299139Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_256x256x160x64_16x16_8x5_8x32x1_8x32x1_1x64x1x4_8x8x1_2x1_intrawave_v3.hip 2025-05-07T19:50:51.6300480Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_256x256x192x64_32x32_4x3_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:50:51.6301829Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_256x256x224x64_16x16_8x7_8x32x1_8x32x1_1x64x1x4_8x8x1_2x1_intrawave_v3.hip 2025-05-07T19:50:51.6303161Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_256x256x256x64_32x32_4x4_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:50:51.6304914Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_256x32x128x128_16x16_1x4_16x16x1_16x16x1_1x32x1x8_8x8x1_1x2_intrawave_v2.hip 2025-05-07T19:50:51.6306384Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_256x32x224x64_16x16_1x7_8x32x1_8x32x1_1x32x1x8_4x4x1_1x1_intrawave_v1.hip 2025-05-07T19:50:51.6307821Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_256x32x96x64_16x16_1x3_8x32x1_8x32x1_1x32x1x8_4x4x1_1x1_interwave_v1.hip 2025-05-07T19:50:51.6309426Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_256x32x96x64_16x16_1x3_8x32x1_8x32x1_1x32x1x8_4x4x1_1x1_interwave_v2.hip 2025-05-07T19:50:51.6310893Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_256x64x128x128_32x32_2x1_16x16x1_16x16x1_1x16x1x16_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:50:51.6312379Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_256x64x192x128_16x16_4x3_16x16x1_16x16x1_1x32x1x8_8x8x1_2x1_intrawave_v3.hip 2025-05-07T19:50:51.6314038Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_256x64x96x64_16x16_2x3_8x32x1_8x32x1_1x64x1x4_8x8x1_2x1_intrawave_v3.hip 2025-05-07T19:50:51.6315482Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_64x16x16x128_16x16_1x1_16x4x1_16x4x1_1x16x1x4_4x4x1_1x1_interwave_v2.hip 2025-05-07T19:50:51.6316915Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_64x16x16x128_16x16_1x1_16x4x1_16x4x1_1x16x1x4_4x4x1_1x1_intrawave_v1.hip 2025-05-07T19:50:51.6318350Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_64x16x16x64_16x16_1x1_8x8x1_8x8x1_1x16x1x4_4x4x1_1x1_interwave_v2.hip 2025-05-07T19:50:51.6319796Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_64x16x32x128_16x16_1x2_16x4x1_16x4x1_1x16x1x4_8x8x1_1x2_intrawave_v2.hip 2025-05-07T19:50:51.6321296Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_64x16x48x128_16x16_1x3_16x4x1_16x4x1_1x16x1x4_4x4x1_1x1_intrawave_v1.hip 2025-05-07T19:50:51.6322640Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/bf16_grouped/kernels/bf16_grouped_64x16x64x128_16x16_1x4_16x4x1_16x4x1_1x16x1x4_8x8x1_1x2_intrawave_v2.hip 2025-05-07T19:50:51.6323640Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/ck_utility.hip 2025-05-07T19:50:51.6324324Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_blockwise_gemm.hip 2025-05-07T19:50:51.6325192Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/fp8_rowwise_gemm.hip 2025-05-07T19:50:51.6326257Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_128x128x16x128_16x16_4x1_8x16x1_8x16x1_1x16x1x8_8x8x1_1x1_interwave_v2.hip 2025-05-07T19:50:51.6327602Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_128x128x32x128_32x32_2x1_8x16x1_8x16x1_1x16x1x8_4x4x1_1x1_intrawave_v2.hip 2025-05-07T19:50:51.6328936Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_128x16x32x128_16x16_1x1_8x16x1_8x16x1_1x16x1x8_4x4x1_1x1_interwave_v2.hip 2025-05-07T19:50:51.6330286Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_128x16x32x128_16x16_1x1_8x16x1_8x16x1_1x16x1x8_4x4x1_1x1_interwave_v2_4_split_k.hip 2025-05-07T19:50:51.6332074Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_128x16x32x128_16x16_1x1_8x16x1_8x16x1_1x16x1x8_4x4x1_1x1_interwave_v2_8_split_k.hip 2025-05-07T19:50:51.6333440Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_128x16x32x128_16x16_1x1_8x16x1_8x16x1_1x16x1x8_4x4x1_1x1_intrawave_v1.hip 2025-05-07T19:50:51.6334757Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_128x16x32x128_16x16_1x1_8x16x1_8x16x1_1x16x1x8_4x4x1_1x1_intrawave_v2.hip 2025-05-07T19:50:51.6336117Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_128x16x32x128_16x16_1x1_8x16x1_8x16x1_1x16x1x8_4x4x1_1x1_intrawave_v2_8_split_k.hip 2025-05-07T19:50:51.6337485Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_128x16x32x256_16x16_1x1_16x8x1_16x8x1_1x16x1x8_4x4x1_1x1_intrawave_v1.hip 2025-05-07T19:50:51.6338811Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_128x16x32x512_16x16_1x1_32x4x1_32x4x1_1x16x1x8_4x4x1_1x1_interwave_v2.hip 2025-05-07T19:50:51.6340170Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_128x16x32x512_16x16_1x1_32x4x1_32x4x1_1x16x1x8_4x4x1_1x1_interwave_v2_2_split_k.hip 2025-05-07T19:50:51.6341528Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_128x16x32x512_16x16_1x1_32x4x1_32x4x1_1x16x1x8_4x4x1_1x1_intrawave_v2.hip 2025-05-07T19:50:51.6342877Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_128x16x32x512_16x16_1x1_32x4x1_32x4x1_1x16x1x8_4x4x1_1x1_intrawave_v2_2_split_k.hip 2025-05-07T19:50:51.6344541Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_128x16x32x512_16x16_1x1_8x16x1_8x16x1_1x16x1x8_4x4x1_1x1_interwave_v2.hip 2025-05-07T19:50:51.6345995Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_128x32x128x128_32x32_1x2_8x16x1_8x16x1_1x16x1x8_8x8x1_1x1_interwave_v2.hip 2025-05-07T19:50:51.6347426Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_128x32x16x128_16x16_1x1_8x16x1_8x16x1_1x16x1x8_2x2x1_1x1_interwave_v1.hip 2025-05-07T19:50:51.6348864Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_128x32x16x128_16x16_1x1_8x16x1_8x16x1_1x16x1x8_2x2x1_1x1_interwave_v2.hip 2025-05-07T19:50:51.6350302Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_128x32x16x256_16x16_1x1_16x8x1_16x8x1_1x32x1x4_4x4x1_1x1_interwave_v1.hip 2025-05-07T19:50:51.6351730Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_128x32x16x512_16x16_1x1_32x4x1_32x4x1_1x32x1x4_4x4x1_1x1_interwave_v2.hip 2025-05-07T19:50:51.6353251Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_128x32x16x512_16x16_1x1_32x4x1_32x4x1_1x32x1x4_4x4x1_1x1_intrawave_v2.hip 2025-05-07T19:50:51.6354681Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_128x32x64x128_32x32_1x1_8x16x1_8x16x1_1x16x1x8_8x8x1_1x1_interwave_v2.hip 2025-05-07T19:50:51.6356103Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_128x32x64x128_32x32_1x1_8x16x1_8x16x1_1x16x1x8_8x8x1_1x1_intrawave_v2.hip 2025-05-07T19:50:51.6357555Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_128x64x32x128_32x32_1x1_8x16x1_8x16x1_1x16x1x8_4x4x1_1x1_interwave_v2.hip 2025-05-07T19:50:51.6358929Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_128x64x32x128_32x32_1x1_8x16x1_8x16x1_1x16x1x8_4x4x1_1x1_intrawave_v2.hip 2025-05-07T19:50:51.6360250Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x128x128x128_16x16_4x4_8x32x1_8x32x1_1x32x1x8_8x8x1_1x2_intrawave_v3.hip 2025-05-07T19:50:51.6361592Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x128x128x128_32x32_2x2_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_interwave_v1.hip 2025-05-07T19:50:51.6362927Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x128x128x128_32x32_2x2_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:50:51.6364256Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x128x128x128_32x32_2x2_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v5.hip 2025-05-07T19:50:51.6365601Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x128x128x256_32x32_2x2_16x16x1_16x16x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:50:51.6366942Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x128x128x64_32x32_2x2_4x64x1_4x64x1_1x32x1x8_8x8x1_1x1_intrawave_v4.hip 2025-05-07T19:50:51.6368268Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x128x160x128_16x16_4x5_8x32x1_8x32x1_1x64x1x4_8x8x1_2x1_intrawave_v3.hip 2025-05-07T19:50:51.6369608Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x128x160x128_32x32_1x5_8x32x1_8x32x1_1x64x1x4_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:50:51.6370953Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x128x192x128_32x32_2x3_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:50:51.6372291Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x128x256x128_32x32_2x4_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:50:51.6373626Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x128x64x128_32x32_2x1_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:50:51.6374954Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x128x64x256_32x32_2x1_16x16x1_16x16x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:50:51.6376295Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x128x96x128_16x16_4x3_8x32x1_8x32x1_1x64x1x4_8x8x1_2x1_intrawave_v3.hip 2025-05-07T19:50:51.6377631Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x128x96x256_32x32_1x3_16x16x1_16x16x1_1x64x1x4_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:50:51.6379057Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x160x128x128_16x16_5x4_8x32x1_8x32x1_1x32x1x8_8x8x1_1x2_intrawave_v3.hip 2025-05-07T19:50:51.6380392Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x160x256x128_16x16_5x8_8x32x1_8x32x1_1x32x1x8_8x8x1_1x2_intrawave_v3.hip 2025-05-07T19:50:51.6381724Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x160x96x128_16x16_5x3_8x32x1_8x32x1_1x32x1x8_4x4x1_1x1_intrawave_v3.hip 2025-05-07T19:50:51.6383084Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x16x64x128_16x16_1x1_16x16x1_8x32x1_1x16x1x16_4x4x1_1x1_intrawave_v2_8_split_k.hip 2025-05-07T19:50:51.6384816Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x16x64x512_16x16_1x1_32x8x1_32x8x1_1x16x1x16_4x4x1_1x1_intrawave_v2.hip 2025-05-07T19:50:51.6386255Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x16x64x512_16x16_1x1_32x8x1_32x8x1_1x16x1x16_4x4x1_1x1_intrawave_v3.hip 2025-05-07T19:50:51.6387689Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x192x128x128_16x16_6x4_8x32x1_8x32x1_1x32x1x8_8x8x1_2x2_intrawave_v3.hip 2025-05-07T19:50:51.6389134Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x192x192x128_16x16_6x6_8x32x1_8x32x1_1x32x1x8_8x8x1_1x2_intrawave_v3.hip 2025-05-07T19:50:51.6390579Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x192x224x128_16x16_6x7_8x32x1_8x32x1_1x64x1x4_8x8x1_2x1_intrawave_v3.hip 2025-05-07T19:50:51.6392016Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x192x256x128_16x16_6x8_8x32x1_8x32x1_1x32x1x8_8x8x1_1x2_intrawave_v3.hip 2025-05-07T19:50:51.6393462Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x192x256x128_16x16_6x8_8x32x1_8x32x1_1x32x1x8_8x8x1_2x2_intrawave_v3.hip 2025-05-07T19:50:51.6394915Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x224x160x128_16x16_7x5_8x32x1_8x32x1_1x32x1x8_4x4x1_1x1_intrawave_v3.hip 2025-05-07T19:50:51.6396459Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x224x192x128_16x16_7x6_8x32x1_8x32x1_1x32x1x8_8x8x1_1x2_intrawave_v3.hip 2025-05-07T19:50:51.6397793Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x224x256x128_16x16_7x8_8x32x1_8x32x1_1x32x1x8_8x8x1_1x2_intrawave_v3.hip 2025-05-07T19:50:51.6399132Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x256x128x128_16x16_8x4_8x32x1_8x32x1_1x32x1x8_8x8x1_1x2_intrawave_v3.hip 2025-05-07T19:50:51.6400461Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x256x128x128_32x32_4x2_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:50:51.6401797Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x256x160x128_16x16_8x5_8x32x1_8x32x1_1x64x1x4_8x8x1_2x1_intrawave_v3.hip 2025-05-07T19:50:51.6403128Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x256x192x128_16x16_8x6_8x32x1_8x32x1_1x32x1x8_8x8x1_1x2_intrawave_v3.hip 2025-05-07T19:50:51.6404451Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x256x192x128_32x32_4x3_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:50:51.6405852Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x256x224x128_16x16_8x7_8x32x1_8x32x1_1x64x1x4_8x8x1_2x1_intrawave_v3.hip 2025-05-07T19:50:51.6407188Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x256x256x128_16x16_8x8_8x32x1_8x32x1_1x32x1x8_8x8x1_1x2_intrawave_v3.hip 2025-05-07T19:50:51.6408513Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x256x256x64_16x16_8x8_4x64x1_4x64x1_1x32x1x8_8x8x1_1x2_intrawave_v3.hip 2025-05-07T19:50:51.6410297Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x256x256x64_32x32_4x4_4x64x1_4x64x1_1x32x1x8_8x8x1_1x1_intrawave_v4.hip 2025-05-07T19:50:51.6411743Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x256x96x128_16x16_8x3_8x32x1_8x32x1_1x64x1x4_8x8x1_2x1_intrawave_v3.hip 2025-05-07T19:50:51.6413452Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x256x96x128_32x32_2x3_8x32x1_8x32x1_1x64x1x4_8x8x1_2x1_intrawave_v3.hip 2025-05-07T19:50:51.6414903Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x32x128x256_32x32_1x1_16x16x1_16x16x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:50:51.6416351Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x32x64x512_16x16_1x2_32x8x1_32x8x1_1x32x1x8_8x8x1_1x2_intrawave_v3.hip 2025-05-07T19:50:51.6417781Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x64x128x128_32x32_1x2_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:50:51.6419236Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x64x128x256_32x32_1x2_16x16x1_16x16x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:50:51.6420683Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x64x16x512_16x16_1x1_32x8x1_32x8x1_1x64x1x4_4x4x1_1x1_intrawave_v2.hip 2025-05-07T19:50:51.6422275Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x64x192x128_32x32_1x3_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:50:51.6423670Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x64x192x256_32x32_1x3_16x16x1_16x16x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:50:51.6425358Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x64x256x128_32x32_1x4_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:50:51.6426800Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x64x64x128_32x32_1x1_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:50:51.6428231Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x64x64x512_32x32_1x1_32x8x1_32x8x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:50:51.6429656Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x64x96x256_16x16_2x3_16x16x1_16x16x1_1x64x1x4_8x8x1_2x1_intrawave_v3.hip 2025-05-07T19:50:51.6431104Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x80x128x256_16x16_5x2_16x16x1_16x16x1_1x16x1x16_8x8x1_1x2_intrawave_v3.hip 2025-05-07T19:50:51.6432560Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_256x96x128x128_16x16_3x4_8x32x1_8x32x1_1x32x1x8_8x8x1_1x2_intrawave_v3.hip 2025-05-07T19:50:51.6434063Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_64x16x16x128_16x16_1x1_8x8x1_8x8x1_1x16x1x4_4x4x1_1x1_interwave_v1.hip 2025-05-07T19:50:51.6435477Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_64x16x16x128_16x16_1x1_8x8x1_8x8x1_1x16x1x4_4x4x1_1x1_interwave_v2.hip 2025-05-07T19:50:51.6436959Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_64x16x16x256_16x16_1x1_16x4x1_16x4x1_1x16x1x4_4x4x1_1x1_intrawave_v1.hip 2025-05-07T19:50:51.6438260Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_64x16x16x256_16x16_1x1_16x4x1_16x4x1_1x4x1x16_4x4x1_1x1_intrawave_v1.hip 2025-05-07T19:50:51.6439573Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_64x16x16x512_16x16_1x1_32x2x1_32x2x1_1x16x1x4_4x4x1_1x1_interwave_v2.hip 2025-05-07T19:50:51.6440956Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_64x16x16x512_16x16_1x1_8x8x1_8x8x1_1x16x1x4_4x4x1_1x1_interwave_v2.hip 2025-05-07T19:50:51.6442249Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise/kernels/fp8_rowwise_64x16x16x64_16x16_1x1_4x16x1_4x16x1_1x16x1x4_4x4x1_1x1_interwave_v2.hip 2025-05-07T19:50:51.6443359Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/fp8_rowwise_batched_gemm.hip 2025-05-07T19:50:51.6444547Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_128x16x32x256_16x16_1x1_16x8x1_16x8x1_1x16x1x8_4x4x1_1x1_interwave_v1.hip 2025-05-07T19:50:51.6445981Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_128x16x32x256_16x16_1x1_8x16x1_8x16x1_1x16x1x8_4x4x1_1x1_interwave_v1.hip 2025-05-07T19:50:51.6447431Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_128x16x32x256_16x16_1x1_8x16x1_8x16x1_1x16x1x8_4x4x1_1x1_interwave_v2.hip 2025-05-07T19:50:51.6448871Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_128x16x32x256_16x16_1x1_8x16x1_8x16x1_1x16x1x8_4x4x1_1x1_intrawave_v1.hip 2025-05-07T19:50:51.6450301Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_128x16x32x256_16x16_1x1_8x16x1_8x16x1_1x16x1x8_4x4x1_1x1_intrawave_v2.hip 2025-05-07T19:50:51.6451750Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_128x16x32x512_16x16_1x1_32x4x1_32x4x1_1x16x1x8_4x4x1_1x1_intrawave_v2.hip 2025-05-07T19:50:51.6453192Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_128x16x32x512_16x16_1x1_8x16x1_8x16x1_1x16x1x8_4x4x1_1x1_intrawave_v1.hip 2025-05-07T19:50:51.6454614Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_128x16x32x512_16x16_1x1_8x16x1_8x16x1_1x16x1x8_4x4x1_1x1_intrawave_v2.hip 2025-05-07T19:50:51.6456054Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_128x32x128x128_32x32_1x2_8x16x1_8x16x1_1x16x1x8_8x8x1_1x1_interwave_v2.hip 2025-05-07T19:50:51.6457494Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_128x32x64x128_32x32_1x1_8x16x1_8x16x1_1x16x1x8_8x8x1_1x1_intrawave_v2.hip 2025-05-07T19:50:51.6458929Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_256x128x128x128_32x32_2x2_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:50:51.6460438Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_256x128x128x128_32x32_2x2_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v4.hip 2025-05-07T19:50:51.6461888Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_256x128x128x128_32x32_2x2_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v5.hip 2025-05-07T19:50:51.6463336Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_256x128x128x256_32x32_2x2_16x16x1_16x16x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:50:51.6465121Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_256x128x160x128_32x32_1x5_8x32x1_8x32x1_1x64x1x4_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:50:51.6466764Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_256x128x192x128_32x32_2x3_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:50:51.6468334Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_256x128x256x128_32x32_2x4_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:50:51.6469917Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_256x128x64x128_32x32_2x1_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:50:51.6471489Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_256x128x96x256_32x32_1x3_16x16x1_16x16x1_1x64x1x4_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:50:51.6473047Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_256x16x64x512_16x16_1x1_32x8x1_32x8x1_1x16x1x16_4x4x1_1x1_intrawave_v3.hip 2025-05-07T19:50:51.6474631Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_256x224x256x128_16x16_7x8_8x32x1_8x32x1_1x32x1x8_8x8x1_1x2_intrawave_v3.hip 2025-05-07T19:50:51.6476317Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_256x256x128x128_16x16_8x4_8x32x1_8x32x1_1x32x1x8_8x8x1_1x2_intrawave_v3.hip 2025-05-07T19:50:51.6477761Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_256x256x160x128_16x16_8x5_8x32x1_8x32x1_1x64x1x4_8x8x1_2x1_intrawave_v3.hip 2025-05-07T19:50:51.6479208Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_256x256x192x128_16x16_8x6_8x32x1_8x32x1_1x32x1x8_8x8x1_1x2_intrawave_v3.hip 2025-05-07T19:50:51.6480662Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_256x256x224x128_16x16_8x7_8x32x1_8x32x1_1x64x1x4_8x8x1_2x1_intrawave_v3.hip 2025-05-07T19:50:51.6482097Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_256x256x256x128_16x16_8x8_8x32x1_8x32x1_1x32x1x8_8x8x1_1x2_intrawave_v3.hip 2025-05-07T19:50:51.6483550Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_256x32x128x256_32x32_1x1_16x16x1_16x16x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:50:51.6484994Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_256x32x64x512_16x16_1x2_32x8x1_32x8x1_1x32x1x8_8x8x1_1x2_intrawave_v3.hip 2025-05-07T19:50:51.6486428Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_256x64x128x256_32x32_1x2_16x16x1_16x16x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:50:51.6487942Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_256x64x192x256_32x32_1x3_16x16x1_16x16x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:50:51.6489392Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_256x64x64x128_32x32_1x1_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:50:51.6490822Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_256x64x64x512_32x32_1x1_32x8x1_32x8x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:50:51.6492256Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_64x16x16x512_16x16_1x1_32x2x1_32x2x1_1x16x1x4_4x4x1_1x1_interwave_v2.hip 2025-05-07T19:50:51.6493756Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_64x16x16x512_16x16_1x1_32x2x1_32x2x1_1x16x1x4_4x4x1_1x1_intrawave_v2.hip 2025-05-07T19:50:51.6495156Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_64x16x16x512_16x16_1x1_8x8x1_8x8x1_1x16x1x4_4_1x1_interwave_v1.hip 2025-05-07T19:50:51.6496549Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/kernels/fp8_rowwise_batched_64x16x16x512_16x16_1x1_8x8x1_8x8x1_1x16x1x4_4_1x1_interwave_v2.hip 2025-05-07T19:50:51.6497706Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/fp8_rowwise_grouped_gemm.hip 2025-05-07T19:50:51.6498891Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_128x16x32x128_16x16_1x1_8x16x1_8x16x1_1x16x1x8_2x2x1_1x1_intrawave_v2.hip 2025-05-07T19:50:51.6500334Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_128x16x32x128_16x16_1x1_8x16x1_8x16x1_1x16x1x8_4x4x1_1x1_intrawave_v2.hip 2025-05-07T19:50:51.6501782Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_128x16x32x256_16x16_1x1_16x8x1_16x8x1_1x16x1x8_4x4x1_1x1_interwave_v2.hip 2025-05-07T19:50:51.6503210Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_128x16x32x512_16x16_1x1_32x4x1_32x4x1_1x16x1x8_4x4x1_1x1_interwave_v2.hip 2025-05-07T19:50:51.6504947Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_128x16x32x512_16x16_1x1_32x4x1_32x4x1_1x16x1x8_4x4x1_1x1_intrawave_v1.hip 2025-05-07T19:50:51.6506515Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_128x16x32x512_16x16_1x1_32x4x1_32x4x1_1x16x1x8_4x4x1_1x1_intrawave_v2.hip 2025-05-07T19:50:51.6508062Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_128x16x64x256_16x16_1x2_16x8x1_16x8x1_1x16x1x8_8x8x1_1x2_interwave_v1.hip 2025-05-07T19:50:51.6509754Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_128x16x64x256_16x16_1x2_16x8x1_16x8x1_1x16x1x8_8x8x1_1x2_interwave_v2.hip 2025-05-07T19:50:51.6511310Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_128x16x64x256_16x16_1x2_16x8x1_16x8x1_1x16x1x8_8x8x1_1x2_intrawave_v1.hip 2025-05-07T19:50:51.6512857Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_128x16x96x256_16x16_1x3_16x8x1_16x8x1_1x16x1x8_4x4x1_1x1_intrawave_v1.hip 2025-05-07T19:50:51.6514513Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_128x32x16x128_16x16_1x1_8x16x1_8x16x1_1x16x1x8_2x2x1_1x1_interwave_v2.hip 2025-05-07T19:50:51.6516076Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_128x32x64x256_16x16_1x4_16x8x1_16x8x1_1x32x1x4_8x8x1_1x2_intrawave_v1.hip 2025-05-07T19:50:51.6517700Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_128x32x64x256_32x32_1x1_16x8x1_16x8x1_1x16x1x8_8x8x1_1x1_interwave_v1.hip 2025-05-07T19:50:51.6519379Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_128x64x64x256_32x32_1x2_16x8x1_16x8x1_1x16x1x8_8x8x1_1x2_intrawave_v3.hip 2025-05-07T19:50:51.6521044Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_128x64x64x256_32x32_2x1_16x8x1_16x8x1_1x16x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:50:51.6522492Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x128x128x128_32x32_2x2_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_interwave_v1.hip 2025-05-07T19:50:51.6523946Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x128x128x128_32x32_2x2_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:50:51.6525404Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x128x128x256_32x32_2x2_16x16x1_16x16x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:50:51.6526853Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x128x224x128_16x16_4x7_8x32x1_8x32x1_1x64x1x4_8x8x1_2x1_intrawave_v3.hip 2025-05-07T19:50:51.6528309Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x128x256x128_32x32_4x2_8x32x1_8x32x1_1x16x1x16_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:50:51.6529767Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x128x96x128_16x16_4x3_8x32x1_8x32x1_1x64x1x4_8x8x1_2x1_intrawave_v3.hip 2025-05-07T19:50:51.6531213Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x16x128x256_16x16_1x2_16x16x1_16x16x1_1x16x1x16_8x8x1_1x2_intrawave_v1.hip 2025-05-07T19:50:51.6532677Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x16x128x256_16x16_1x2_16x16x1_16x16x1_1x16x1x16_8x8x1_1x2_intrawave_v2.hip 2025-05-07T19:50:51.6534148Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x16x128x256_16x16_1x2_16x16x1_16x16x1_1x16x1x16_8x8x1_1x2_intrawave_v3.hip 2025-05-07T19:50:51.6535594Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x16x64x256_16x16_1x1_16x16x1_16x16x1_1x16x1x16_4x4x1_1x1_interwave_v2.hip 2025-05-07T19:50:51.6537056Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x16x64x256_16x16_1x1_16x16x1_16x16x1_1x16x1x16_4x4x1_1x1_intrawave_v1.hip 2025-05-07T19:50:51.6538510Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x16x64x256_16x16_1x1_16x16x1_16x16x1_1x16x1x16_4x4x1_1x1_intrawave_v2.hip 2025-05-07T19:50:51.6539949Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x16x64x512_16x16_1x1_32x8x1_32x8x1_1x16x1x16_4x4x1_1x1_intrawave_v1.hip 2025-05-07T19:50:51.6541454Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x192x96x128_16x16_6x3_8x32x1_8x32x1_1x64x1x4_8x8x1_2x1_intrawave_v3.hip 2025-05-07T19:50:51.6542911Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x224x256x128_16x16_7x8_8x32x1_8x32x1_1x32x1x8_8x8x1_1x2_intrawave_v3.hip 2025-05-07T19:50:51.6544627Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x256x128x64_32x32_4x2_4x64x1_4x64x1_1x32x1x8_8x8x1_1x1_interwave_v1.hip 2025-05-07T19:50:51.6546212Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x256x160x128_32x32_2x5_8x32x1_8x32x1_1x64x1x4_8x8x1_2x1_intrawave_v3.hip 2025-05-07T19:50:51.6547861Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x256x192x128_32x32_4x3_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:50:51.6549433Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x256x224x128_16x16_8x7_8x32x1_8x32x1_1x64x1x4_8x8x1_2x1_intrawave_v3.hip 2025-05-07T19:50:51.6551008Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x256x256x128_32x32_4x4_8x32x1_8x32x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:50:51.6552597Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x256x256x128_32x32_8x2_8x32x1_8x32x1_1x16x1x16_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:50:51.6554159Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x32x128x128_16x16_1x4_8x32x1_8x32x1_1x32x1x8_8x8x1_1x2_interwave_v2.hip 2025-05-07T19:50:51.6555735Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x32x160x128_16x16_1x5_8x32x1_8x32x1_1x32x1x8_4x4x1_1x1_interwave_v2.hip 2025-05-07T19:50:51.6557330Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x32x160x128_16x16_1x5_8x32x1_8x32x1_1x32x1x8_4x4x1_1x1_intrawave_v1.hip 2025-05-07T19:50:51.6558764Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x32x256x128_16x16_1x8_8x32x1_8x32x1_1x32x1x8_8x8x1_1x2_intrawave_v1.hip 2025-05-07T19:50:51.6560205Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x32x32x512_16x16_1x1_32x8x1_32x8x1_1x32x1x8_4x4x1_1x1_interwave_v2.hip 2025-05-07T19:50:51.6561643Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x32x32x512_16x16_1x1_32x8x1_32x8x1_1x32x1x8_4x4x1_1x1_intrawave_v2.hip 2025-05-07T19:50:51.6563066Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x32x64x512_16x16_2x1_32x8x1_32x8x1_1x32x1x8_8x8x1_2x1_intrawave_v2.hip 2025-05-07T19:50:51.6564517Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x64x128x256_32x32_1x2_16x16x1_16x16x1_1x32x1x8_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:50:51.6565989Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x64x128x256_32x32_2x1_16x16x1_16x16x1_1x16x1x16_8x8x1_1x1_intrawave_v3.hip 2025-05-07T19:50:51.6567442Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x64x160x128_16x16_2x5_8x32x1_8x32x1_1x64x1x4_8x8x1_2x1_intrawave_v3.hip 2025-05-07T19:50:51.6568950Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_256x64x192x128_16x16_4x3_8x32x1_8x32x1_1x32x1x8_8x8x1_2x1_intrawave_v3.hip 2025-05-07T19:50:51.6570387Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_64x16x16x128_16x16_1x1_8x8x1_8x8x1_1x16x1x4_4x4x1_1x1_interwave_v2.hip 2025-05-07T19:50:51.6571805Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_64x16x16x256_16x16_1x1_16x4x1_16x4x1_1x16x1x4_4x4x1_1x1_interwave_v2.hip 2025-05-07T19:50:51.6573240Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_64x16x16x256_16x16_1x1_16x4x1_16x4x1_1x16x1x4_4x4x1_1x1_intrawave_v1.hip 2025-05-07T19:50:51.6574735Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_64x16x32x256_16x16_1x2_16x4x1_16x4x1_1x16x1x4_8x8x1_1x2_intrawave_v1.hip 2025-05-07T19:50:51.6576162Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_64x16x64x256_16x16_1x4_16x4x1_16x4x1_1x16x1x4_8x8x1_1x2_interwave_v1.hip 2025-05-07T19:50:51.6577610Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_grouped/kernels/fp8_rowwise_grouped_64x16x64x256_16x16_1x4_16x4x1_16x4x1_1x16x1x4_8x8x1_1x2_intrawave_v1.hip 2025-05-07T19:50:51.6578703Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_tensorwise_gemm.hip 2025-05-07T19:50:51.6579195Z 2025-05-07T19:50:51.6579373Z OTHER_SRCS: 2025-05-07T19:50:51.6579489Z 2025-05-07T19:50:51.6579569Z 2025-05-07T19:50:51.6579760Z CC_FLAGS: 2025-05-07T19:50:51.6579873Z 2025-05-07T19:50:51.6579958Z 2025-05-07T19:50:51.6580123Z NVCC_FLAGS: 2025-05-07T19:50:51.6580237Z 2025-05-07T19:50:51.6580317Z 2025-05-07T19:50:51.6580488Z HIPCC_FLAGS: 2025-05-07T19:50:51.6580605Z 2025-05-07T19:50:51.6580688Z 2025-05-07T19:50:51.6580849Z INCLUDE_DIRS: 2025-05-07T19:50:51.6581066Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../include 2025-05-07T19:50:51.6581346Z /__w/FBGEMM/FBGEMM/fbgemm_gpu 2025-05-07T19:50:51.6581612Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/include 2025-05-07T19:50:51.6581887Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../include 2025-05-07T19:50:51.6582341Z /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include 2025-05-07T19:50:51.6583081Z /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include 2025-05-07T19:50:51.6583730Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src 2025-05-07T19:50:51.6584322Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include 2025-05-07T19:50:51.6584743Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include 2025-05-07T19:50:51.6585263Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include 2025-05-07T19:50:51.6585777Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include 2025-05-07T19:50:51.6586244Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include 2025-05-07T19:50:51.6586814Z /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include 2025-05-07T19:50:51.6587408Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize 2025-05-07T19:50:51.6587777Z 2025-05-07T19:50:51.6587970Z Selected Source Files: 2025-05-07T19:50:51.6588363Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/attention/attention.cpp 2025-05-07T19:50:51.6588953Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/coalesce/coalesce.cpp 2025-05-07T19:50:51.6589518Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/quantize.cpp 2025-05-07T19:50:51.6590056Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/comm/car.cpp 2025-05-07T19:50:51.6590700Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/gather_scatter/gather_scatter.cpp 2025-05-07T19:50:51.6591334Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/moe/index_shuffling.cpp 2025-05-07T19:50:51.6591904Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/kv_cache/kv_cache.cpp 2025-05-07T19:50:51.6592491Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/attention/gqa_attn_splitk.cu 2025-05-07T19:50:51.6593089Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/coalesce/coalesce.cu 2025-05-07T19:50:51.6593660Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/quantize.cu 2025-05-07T19:50:51.6594191Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/comm/car.cu 2025-05-07T19:50:51.6594768Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/gather_scatter/gather_scatter.cu 2025-05-07T19:50:51.6595388Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/moe/index_shuffling.cu 2025-05-07T19:50:51.6595950Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/kv_cache/kv_cache.cu 2025-05-07T19:50:51.6596768Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/bf16bf16bf16_grouped.cu 2025-05-07T19:50:51.6597502Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/bf16i4bf16.cu 2025-05-07T19:50:51.6598256Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/bf16i4bf16_rowwise_batched.cu 2025-05-07T19:50:51.6599085Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/bf16i4bf16_shuffled_grouped.cu 2025-05-07T19:50:51.6599833Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16.cu 2025-05-07T19:50:51.6600586Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_f.cu 2025-05-07T19:50:51.6601449Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_t.cu 2025-05-07T19:50:51.6602298Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_f.cu 2025-05-07T19:50:51.6603162Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_t.cu 2025-05-07T19:50:51.6604012Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_f.cu 2025-05-07T19:50:51.6604849Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_t.cu 2025-05-07T19:50:51.6605698Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_f.cu 2025-05-07T19:50:51.6606543Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_t.cu 2025-05-07T19:50:51.6607378Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_f.cu 2025-05-07T19:50:51.6608237Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_t.cu 2025-05-07T19:50:51.6609400Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_f.cu 2025-05-07T19:50:51.6610298Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_t.cu 2025-05-07T19:50:51.6611193Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_f.cu 2025-05-07T19:50:51.6612088Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_t.cu 2025-05-07T19:50:51.6612982Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_f.cu 2025-05-07T19:50:51.6613872Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_t.cu 2025-05-07T19:50:51.6614867Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_f.cu 2025-05-07T19:50:51.6615774Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_t.cu 2025-05-07T19:50:51.6616659Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_f.cu 2025-05-07T19:50:51.6617711Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_t.cu 2025-05-07T19:50:51.6618594Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_f.cu 2025-05-07T19:50:51.6619502Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_t.cu 2025-05-07T19:50:51.6620400Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_f.cu 2025-05-07T19:50:51.6621454Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_t.cu 2025-05-07T19:50:51.6622209Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16.cu 2025-05-07T19:50:51.6622902Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_blockwise.cu 2025-05-07T19:50:51.6623694Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_cublas.cu 2025-05-07T19:50:51.6624613Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_lite.cu 2025-05-07T19:50:51.6625382Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise.cu 2025-05-07T19:50:51.6626333Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_128_128_128_2_1_1_t_f.cu 2025-05-07T19:50:51.6627438Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_128_256_128_2_1_1_f_t.cu 2025-05-07T19:50:51.6628554Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_128_256_128_4_4_1_f_t.cu 2025-05-07T19:50:51.6629661Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_128_128_1_1_1_f_f.cu 2025-05-07T19:50:51.6630752Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_16_128_1_1_1_f_f.cu 2025-05-07T19:50:51.6631852Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_256_128_1_1_1_f_f.cu 2025-05-07T19:50:51.6632953Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_256_128_2_1_1_f_f.cu 2025-05-07T19:50:51.6634043Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_32_128_2_1_1_f_f.cu 2025-05-07T19:50:51.6635139Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_64_128_2_1_1_f_f.cu 2025-05-07T19:50:51.6636391Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/dispatch_fp8_rowwise_batched_kernel_on_cluster_size_and_transpose.cu 2025-05-07T19:50:51.6637917Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/dispatch_fp8_rowwise_batched_kernel_on_tile_size.cu 2025-05-07T19:50:51.6639013Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/f8f8bf16_rowwise_batched.cu 2025-05-07T19:50:51.6640244Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/f8f8bf16_rowwise_batched_impl.cu 2025-05-07T19:50:51.6641582Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/handle_transposition.cu 2025-05-07T19:50:51.6642546Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise_grouped.cu 2025-05-07T19:50:51.6643383Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_tensorwise.cu 2025-05-07T19:50:51.6644201Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8i4bf16_rowwise.cu 2025-05-07T19:50:51.6644991Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8i4bf16_shuffled.cu 2025-05-07T19:50:51.6645847Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8i4bf16_shuffled_grouped.cu 2025-05-07T19:50:51.6646642Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/i8i8bf16.cu 2025-05-07T19:50:51.6647388Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/i8i8bf16_dynamic.cu 2025-05-07T19:50:51.6648254Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/mixed_dtype_utils.cu 2025-05-07T19:50:51.6649000Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/fast_gemv/bf16_fast_gemv.cu 2025-05-07T19:50:51.6649733Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/fast_gemv/bf16fp8bf16_fast_gemv.cu 2025-05-07T19:50:51.6650485Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/fast_gemv/fp8fp8bf16_fast_gemv.cu 2025-05-07T19:50:51.6651324Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/fast_gemv/include/fast_gemv.cu 2025-05-07T19:50:51.6652046Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/fast_gemv/include/fast_gemv.cuh 2025-05-07T19:50:51.6652853Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/fast_gemv/include/utility.cuh 2025-05-07T19:50:51.6653313Z 2025-05-07T19:50:51.6653492Z HIPified Source Files: 2025-05-07T19:50:51.6653653Z 2025-05-07T19:50:51.6653732Z 2025-05-07T19:50:51.6653920Z Library Dependencies: 2025-05-07T19:50:51.6654126Z torch 2025-05-07T19:50:51.6654319Z torch_library 2025-05-07T19:50:51.6654711Z /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/lib/libc10.so 2025-05-07T19:50:51.6655342Z /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/libnvrtc.so 2025-05-07T19:50:51.6655974Z /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/lib/libc10_cuda.so 2025-05-07T19:50:51.6656707Z /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/lib/libnccl.so.2 2025-05-07T19:50:51.6657394Z /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/stubs/libcuda.so 2025-05-07T19:50:51.6657940Z /github/home/miniconda/envs/build_binary/lib/stubs/libnvidia-ml.so 2025-05-07T19:50:51.6658316Z 2025-05-07T19:50:51.6658487Z Output Library: 2025-05-07T19:50:51.6658719Z fbgemm_gpu_experimental_gen_ai 2025-05-07T19:50:51.6658954Z 2025-05-07T19:50:51.6659148Z Destination Directory: 2025-05-07T19:50:51.6659288Z 2025-05-07T19:50:51.6659402Z ================================================================================ 2025-05-07T19:50:51.6659627Z 2025-05-07T19:50:51.6659631Z 2025-05-07T19:50:51.6659635Z 2025-05-07T19:50:51.6659742Z ================================================================================ 2025-05-07T19:50:51.6660094Z Adding to Package: fbgemm_gpu/experimental/gen_ai 2025-05-07T19:50:51.6660388Z 2025-05-07T19:50:51.6660568Z TARGETS: 2025-05-07T19:50:51.6660765Z fbgemm_gpu_experimental_gen_ai 2025-05-07T19:50:51.6661013Z 2025-05-07T19:50:51.6661180Z FILES: 2025-05-07T19:50:51.6661295Z 2025-05-07T19:50:51.6661400Z ================================================================================ 2025-05-07T19:50:51.6661612Z 2025-05-07T19:50:51.6661616Z 2025-05-07T19:50:51.6661619Z 2025-05-07T19:50:51.6661739Z ================================================================================ 2025-05-07T19:50:51.6662196Z GPU CPP Library Target: fbgemm_gpu_experimental_example_py (SHARED) 2025-05-07T19:50:51.6662565Z 2025-05-07T19:50:51.6662742Z CPU_SRCS: 2025-05-07T19:50:51.6662866Z 2025-05-07T19:50:51.6662940Z 2025-05-07T19:50:51.6663110Z GPU_SRCS: 2025-05-07T19:50:51.6663501Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/example/src/example_nccl.cpp 2025-05-07T19:50:51.6664213Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/example/src/example_ops.cpp 2025-05-07T19:50:51.6664834Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/example/src/cutlass_sgemm_nn.cu 2025-05-07T19:50:51.6665283Z 2025-05-07T19:50:51.6665475Z CUDA_SPECIFIC_SRCS: 2025-05-07T19:50:51.6665619Z 2025-05-07T19:50:51.6665715Z 2025-05-07T19:50:51.6665906Z HIP_SPECIFIC_SRCS: 2025-05-07T19:50:51.6666061Z 2025-05-07T19:50:51.6666141Z 2025-05-07T19:50:51.6666320Z OTHER_SRCS: 2025-05-07T19:50:51.6666456Z 2025-05-07T19:50:51.6666535Z 2025-05-07T19:50:51.6666721Z CC_FLAGS: 2025-05-07T19:50:51.6666851Z 2025-05-07T19:50:51.6666930Z 2025-05-07T19:50:51.6667134Z NVCC_FLAGS: 2025-05-07T19:50:51.6667253Z 2025-05-07T19:50:51.6667333Z 2025-05-07T19:50:51.6667537Z HIPCC_FLAGS: 2025-05-07T19:50:51.6667719Z 2025-05-07T19:50:51.6667801Z 2025-05-07T19:50:51.6668003Z INCLUDE_DIRS: 2025-05-07T19:50:51.6668229Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../include 2025-05-07T19:50:51.6668547Z /__w/FBGEMM/FBGEMM/fbgemm_gpu 2025-05-07T19:50:51.6668819Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/include 2025-05-07T19:50:51.6669132Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../include 2025-05-07T19:50:51.6669625Z /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include 2025-05-07T19:50:51.6670396Z /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include 2025-05-07T19:50:51.6671042Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src 2025-05-07T19:50:51.6671447Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include 2025-05-07T19:50:51.6671877Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include 2025-05-07T19:50:51.6672341Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include 2025-05-07T19:50:51.6672861Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include 2025-05-07T19:50:51.6673329Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include 2025-05-07T19:50:51.6673875Z /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include 2025-05-07T19:50:51.6674385Z 2025-05-07T19:50:51.6674579Z Selected Source Files: 2025-05-07T19:50:51.6674960Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/example/src/example_nccl.cpp 2025-05-07T19:50:51.6675503Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/example/src/example_ops.cpp 2025-05-07T19:50:51.6676064Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/example/src/cutlass_sgemm_nn.cu 2025-05-07T19:50:51.6676572Z 2025-05-07T19:50:51.6676765Z HIPified Source Files: 2025-05-07T19:50:51.6676906Z 2025-05-07T19:50:51.6676995Z 2025-05-07T19:50:51.6677172Z Library Dependencies: 2025-05-07T19:50:51.6677394Z torch 2025-05-07T19:50:51.6677571Z torch_library 2025-05-07T19:50:51.6677979Z /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/lib/libc10.so 2025-05-07T19:50:51.6678597Z /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/libnvrtc.so 2025-05-07T19:50:51.6679244Z /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/lib/libc10_cuda.so 2025-05-07T19:50:51.6679970Z /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/lib/libnccl.so.2 2025-05-07T19:50:51.6680657Z /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/stubs/libcuda.so 2025-05-07T19:50:51.6681214Z /github/home/miniconda/envs/build_binary/lib/stubs/libnvidia-ml.so 2025-05-07T19:50:51.6681585Z 2025-05-07T19:50:51.6681772Z Output Library: 2025-05-07T19:50:51.6681992Z fbgemm_gpu_experimental_example_py 2025-05-07T19:50:51.6682254Z 2025-05-07T19:50:51.6682433Z Destination Directory: 2025-05-07T19:50:51.6682590Z 2025-05-07T19:50:51.6682697Z ================================================================================ 2025-05-07T19:50:51.6682985Z 2025-05-07T19:50:51.6682989Z 2025-05-07T19:50:51.6682993Z 2025-05-07T19:50:51.6683120Z ================================================================================ 2025-05-07T19:50:51.6683461Z Adding to Package: fbgemm_gpu/experimental/example 2025-05-07T19:50:51.6683774Z 2025-05-07T19:50:51.6683947Z TARGETS: 2025-05-07T19:50:51.6684189Z fbgemm_gpu_experimental_example_py 2025-05-07T19:50:51.6684454Z 2025-05-07T19:50:51.6684664Z FILES: 2025-05-07T19:50:51.6684977Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/example/example/__init__.py 2025-05-07T19:50:51.6685498Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/example/example/utils.py 2025-05-07T19:50:51.6685927Z ================================================================================ 2025-05-07T19:50:51.6686145Z 2025-05-07T19:50:51.6686149Z 2025-05-07T19:50:51.6686152Z 2025-05-07T19:50:51.6686266Z ================================================================================ 2025-05-07T19:50:51.6686673Z Adding to Package: fbgemm_gpu/experimental/gemm/triton_gemm 2025-05-07T19:50:51.6687013Z 2025-05-07T19:50:51.6687272Z TARGETS: 2025-05-07T19:50:51.6687386Z 2025-05-07T19:50:51.6687489Z 2025-05-07T19:50:51.6687670Z FILES: 2025-05-07T19:50:51.6688006Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gemm/triton_gemm/__init__.py 2025-05-07T19:50:51.6688519Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gemm/triton_gemm/fp8_gemm.py 2025-05-07T19:50:51.6689074Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gemm/triton_gemm/grouped_gemm.py 2025-05-07T19:50:51.6689635Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gemm/triton_gemm/matmul_perf_model.py 2025-05-07T19:50:51.6690197Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gemm/triton_gemm/utils.py 2025-05-07T19:50:51.6690598Z ================================================================================ 2025-05-07T19:50:51.6690836Z 2025-05-07T19:50:51.6690938Z -- Configuring done (7.8s) 2025-05-07T19:50:51.6691238Z -- Generating done (0.0s) 2025-05-07T19:50:51.6691705Z -- Build files have been written to: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.12/cmake-build 2025-05-07T19:50:51.6780701Z Change Dir: '/__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.12/cmake-build' 2025-05-07T19:50:51.6781109Z 2025-05-07T19:50:51.6781411Z Run Build Command(s): /github/home/miniconda/envs/build_binary/bin/ninja -v -j 48 install 2025-05-07T19:50:51.8364844Z [1/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/errorhandler.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/errorhandler.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/errorhandler.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/errorhandler.cpp 2025-05-07T19:50:51.8592866Z [2/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zonelist.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zonelist.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zonelist.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zonelist.cpp 2025-05-07T19:50:51.8612159Z [3/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/globals.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/globals.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/globals.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/globals.cpp 2025-05-07T19:50:51.8680777Z [4/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/support.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/support.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/support.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/support.cpp 2025-05-07T19:50:51.8793801Z [5/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/osutils.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/osutils.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/osutils.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/osutils.cpp 2025-05-07T19:50:51.8813240Z [6/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zonetree.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zonetree.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zonetree.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zonetree.cpp 2025-05-07T19:50:51.8832010Z [7/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zonestack.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zonestack.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zonestack.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zonestack.cpp 2025-05-07T19:50:51.8983693Z [8/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zonehash.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zonehash.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zonehash.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zonehash.cpp 2025-05-07T19:50:51.9018250Z [9/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/operand.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/operand.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/operand.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/operand.cpp 2025-05-07T19:50:51.9178907Z [10/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/environment.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/environment.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/environment.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/environment.cpp 2025-05-07T19:50:51.9431522Z [11/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zone.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zone.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zone.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zone.cpp 2025-05-07T19:50:51.9539553Z [12/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/inst.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/inst.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/inst.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/inst.cpp 2025-05-07T19:50:51.9594581Z [13/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/type.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/type.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/type.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/type.cpp 2025-05-07T19:50:51.9613419Z [14/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/string.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/string.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/string.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/string.cpp 2025-05-07T19:50:51.9647193Z [15/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/target.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/target.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/target.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/target.cpp 2025-05-07T19:50:51.9690638Z [16/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/logger.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/logger.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/logger.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/logger.cpp 2025-05-07T19:50:51.9830680Z [17/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/instdb.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/instdb.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/instdb.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/instdb.cpp 2025-05-07T19:50:51.9913589Z [18/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64operand.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64operand.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64operand.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64operand.cpp 2025-05-07T19:50:51.9923632Z In file included from /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64operand.cpp:10: 2025-05-07T19:50:51.9925421Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecB8() const': 2025-05-07T19:50:51.9928557Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64operand.h:132:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:51.9932070Z 132 | ASMJIT_INLINE_NODEBUG constexpr bool isVecB8() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementB); } 2025-05-07T19:50:51.9934021Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:51.9935484Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecH4() const': 2025-05-07T19:50:51.9938629Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64operand.h:133:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:51.9942064Z 133 | ASMJIT_INLINE_NODEBUG constexpr bool isVecH4() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementH); } 2025-05-07T19:50:51.9943815Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:51.9945103Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecS2() const': 2025-05-07T19:50:51.9948239Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64operand.h:134:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:51.9951891Z 134 | ASMJIT_INLINE_NODEBUG constexpr bool isVecS2() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementS); } 2025-05-07T19:50:51.9953717Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:51.9955144Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecD1() const': 2025-05-07T19:50:51.9958127Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64operand.h:135:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:51.9961417Z 135 | ASMJIT_INLINE_NODEBUG constexpr bool isVecD1() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature); } 2025-05-07T19:50:51.9963115Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:51.9964568Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecB16() const': 2025-05-07T19:50:51.9967657Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64operand.h:137:112: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:51.9971197Z 137 | ASMJIT_INLINE_NODEBUG constexpr bool isVecB16() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementB); } 2025-05-07T19:50:51.9973067Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:51.9974507Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecH8() const': 2025-05-07T19:50:51.9977541Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64operand.h:138:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:51.9981050Z 138 | ASMJIT_INLINE_NODEBUG constexpr bool isVecH8() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementH); } 2025-05-07T19:50:51.9983049Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:51.9984575Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecS4() const': 2025-05-07T19:50:51.9987612Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64operand.h:139:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:51.9991029Z 139 | ASMJIT_INLINE_NODEBUG constexpr bool isVecS4() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementS); } 2025-05-07T19:50:51.9993052Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:51.9994519Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecD2() const': 2025-05-07T19:50:51.9997604Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64operand.h:140:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:52.0000934Z 140 | ASMJIT_INLINE_NODEBUG constexpr bool isVecD2() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementD); } 2025-05-07T19:50:52.0002772Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:52.0004309Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecB4x4() const': 2025-05-07T19:50:52.0007463Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64operand.h:141:113: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:52.0011009Z 141 | ASMJIT_INLINE_NODEBUG constexpr bool isVecB4x4() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementB4); } 2025-05-07T19:50:52.0012873Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:52.0014351Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecH2x4() const': 2025-05-07T19:50:52.0017538Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64operand.h:142:113: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:52.0021283Z 142 | ASMJIT_INLINE_NODEBUG constexpr bool isVecH2x4() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementH2); } 2025-05-07T19:50:52.0023310Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:52.0024034Z At global scope: 2025-05-07T19:50:52.0025301Z cc1plus: note: unrecognized command-line option '-Wno-deprecated-anon-enum-enum-conversion' may have been intended to silence earlier diagnostics 2025-05-07T19:50:52.0035724Z [19/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/constpool.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/constpool.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/constpool.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/constpool.cpp 2025-05-07T19:50:52.0205845Z [20/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zonevector.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zonevector.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zonevector.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zonevector.cpp 2025-05-07T19:50:52.0242919Z [21/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/jitruntime.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/jitruntime.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/jitruntime.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/jitruntime.cpp 2025-05-07T19:50:52.0311783Z [22/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64func.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64func.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64func.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64func.cpp 2025-05-07T19:50:52.0322055Z In file included from /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64func.cpp:10: 2025-05-07T19:50:52.0323839Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecB8() const': 2025-05-07T19:50:52.0327322Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64operand.h:132:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:52.0330926Z 132 | ASMJIT_INLINE_NODEBUG constexpr bool isVecB8() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementB); } 2025-05-07T19:50:52.0332752Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:52.0334259Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecH4() const': 2025-05-07T19:50:52.0337507Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64operand.h:133:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:52.0341259Z 133 | ASMJIT_INLINE_NODEBUG constexpr bool isVecH4() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementH); } 2025-05-07T19:50:52.0343205Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:52.0344833Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecS2() const': 2025-05-07T19:50:52.0347975Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64operand.h:134:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:52.0351567Z 134 | ASMJIT_INLINE_NODEBUG constexpr bool isVecS2() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementS); } 2025-05-07T19:50:52.0353417Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:52.0354928Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecD1() const': 2025-05-07T19:50:52.0358435Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64operand.h:135:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:52.0362439Z 135 | ASMJIT_INLINE_NODEBUG constexpr bool isVecD1() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature); } 2025-05-07T19:50:52.0364391Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:52.0366052Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecB16() const': 2025-05-07T19:50:52.0369402Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64operand.h:137:112: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:52.0373118Z 137 | ASMJIT_INLINE_NODEBUG constexpr bool isVecB16() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementB); } 2025-05-07T19:50:52.0375323Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:52.0376956Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecH8() const': 2025-05-07T19:50:52.0380428Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64operand.h:138:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:52.0384451Z 138 | ASMJIT_INLINE_NODEBUG constexpr bool isVecH8() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementH); } 2025-05-07T19:50:52.0386565Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:52.0388184Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecS4() const': 2025-05-07T19:50:52.0391636Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64operand.h:139:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:52.0395568Z 139 | ASMJIT_INLINE_NODEBUG constexpr bool isVecS4() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementS); } 2025-05-07T19:50:52.0397659Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:52.0399306Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecD2() const': 2025-05-07T19:50:52.0402776Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64operand.h:140:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:52.0406673Z 140 | ASMJIT_INLINE_NODEBUG constexpr bool isVecD2() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementD); } 2025-05-07T19:50:52.0408936Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:52.0410811Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecB4x4() const': 2025-05-07T19:50:52.0413933Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64operand.h:141:113: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:52.0417349Z 141 | ASMJIT_INLINE_NODEBUG constexpr bool isVecB4x4() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementB4); } 2025-05-07T19:50:52.0419279Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:52.0420826Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecH2x4() const': 2025-05-07T19:50:52.0424405Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64operand.h:142:113: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:52.0428079Z 142 | ASMJIT_INLINE_NODEBUG constexpr bool isVecH2x4() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementH2); } 2025-05-07T19:50:52.0430020Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:52.0430656Z At global scope: 2025-05-07T19:50:52.0431908Z cc1plus: note: unrecognized command-line option '-Wno-deprecated-anon-enum-enum-conversion' may have been intended to silence earlier diagnostics 2025-05-07T19:50:52.0511182Z [23/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/codewriter.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/codewriter.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/codewriter.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/codewriter.cpp 2025-05-07T19:50:52.0674223Z [24/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/rastack.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/rastack.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/rastack.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/rastack.cpp 2025-05-07T19:50:52.0694192Z [25/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/emitterutils.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/emitterutils.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/emitterutils.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/emitterutils.cpp 2025-05-07T19:50:52.0820305Z [26/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/func.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/func.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/func.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/func.cpp 2025-05-07T19:50:52.0861521Z [27/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64instapi.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64instapi.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64instapi.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64instapi.cpp 2025-05-07T19:50:52.0872502Z In file included from /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64instdb_p.h:12, 2025-05-07T19:50:52.0873752Z from /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64instapi.cpp:13: 2025-05-07T19:50:52.0875845Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecB8() const': 2025-05-07T19:50:52.0879104Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/a64operand.h:132:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:52.0882949Z 132 | ASMJIT_INLINE_NODEBUG constexpr bool isVecB8() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementB); } 2025-05-07T19:50:52.0884958Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:52.0886559Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecH4() const': 2025-05-07T19:50:52.0893460Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/a64operand.h:133:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:52.0897422Z 133 | ASMJIT_INLINE_NODEBUG constexpr bool isVecH4() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementH); } 2025-05-07T19:50:52.0899500Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:52.0901151Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecS2() const': 2025-05-07T19:50:52.0904706Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/a64operand.h:134:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:52.0908564Z 134 | ASMJIT_INLINE_NODEBUG constexpr bool isVecS2() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementS); } 2025-05-07T19:50:52.0910793Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:52.0912388Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecD1() const': 2025-05-07T19:50:52.0915828Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/a64operand.h:135:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:52.0919563Z 135 | ASMJIT_INLINE_NODEBUG constexpr bool isVecD1() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature); } 2025-05-07T19:50:52.0921435Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:52.0923050Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecB16() const': 2025-05-07T19:50:52.0926504Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/a64operand.h:137:112: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:52.0930636Z 137 | ASMJIT_INLINE_NODEBUG constexpr bool isVecB16() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementB); } 2025-05-07T19:50:52.0932682Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:52.0934319Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecH8() const': 2025-05-07T19:50:52.0937685Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/a64operand.h:138:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:52.0941542Z 138 | ASMJIT_INLINE_NODEBUG constexpr bool isVecH8() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementH); } 2025-05-07T19:50:52.0943658Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:52.0945155Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecS4() const': 2025-05-07T19:50:52.0948479Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/a64operand.h:139:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:52.0952276Z 139 | ASMJIT_INLINE_NODEBUG constexpr bool isVecS4() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementS); } 2025-05-07T19:50:52.0954311Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:52.0955877Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecD2() const': 2025-05-07T19:50:52.0959255Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/a64operand.h:140:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:52.0963015Z 140 | ASMJIT_INLINE_NODEBUG constexpr bool isVecD2() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementD); } 2025-05-07T19:50:52.0964996Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:52.0966679Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecB4x4() const': 2025-05-07T19:50:52.0970142Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/a64operand.h:141:113: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:52.0974087Z 141 | ASMJIT_INLINE_NODEBUG constexpr bool isVecB4x4() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementB4); } 2025-05-07T19:50:52.0976313Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:52.0977945Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecH2x4() const': 2025-05-07T19:50:52.0981361Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/a64operand.h:142:113: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:52.0985188Z 142 | ASMJIT_INLINE_NODEBUG constexpr bool isVecH2x4() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementH2); } 2025-05-07T19:50:52.0987390Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:52.0988045Z At global scope: 2025-05-07T19:50:52.0989334Z cc1plus: note: unrecognized command-line option '-Wno-deprecated-anon-enum-enum-conversion' may have been intended to silence earlier diagnostics 2025-05-07T19:50:52.1149188Z [28/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/emitter.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/emitter.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/emitter.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/emitter.cpp 2025-05-07T19:50:52.1168605Z [29/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64formatter.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64formatter.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64formatter.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64formatter.cpp 2025-05-07T19:50:52.1179020Z In file included from /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64instdb_p.h:12, 2025-05-07T19:50:52.1180133Z from /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64formatter.cpp:13: 2025-05-07T19:50:52.1181814Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecB8() const': 2025-05-07T19:50:52.1185512Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/a64operand.h:132:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:52.1189051Z 132 | ASMJIT_INLINE_NODEBUG constexpr bool isVecB8() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementB); } 2025-05-07T19:50:52.1190926Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:52.1192490Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecH4() const': 2025-05-07T19:50:52.1196075Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/a64operand.h:133:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:52.1199832Z 133 | ASMJIT_INLINE_NODEBUG constexpr bool isVecH4() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementH); } 2025-05-07T19:50:52.1201756Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:52.1203312Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecS2() const': 2025-05-07T19:50:52.1206531Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/a64operand.h:134:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:52.1210371Z 134 | ASMJIT_INLINE_NODEBUG constexpr bool isVecS2() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementS); } 2025-05-07T19:50:52.1212033Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:52.1213460Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecD1() const': 2025-05-07T19:50:52.1216090Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/a64operand.h:135:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:52.1219025Z 135 | ASMJIT_INLINE_NODEBUG constexpr bool isVecD1() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature); } 2025-05-07T19:50:52.1220561Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:52.1221897Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecB16() const': 2025-05-07T19:50:52.1224973Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/a64operand.h:137:112: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:52.1228489Z 137 | ASMJIT_INLINE_NODEBUG constexpr bool isVecB16() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementB); } 2025-05-07T19:50:52.1230312Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:52.1231792Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecH8() const': 2025-05-07T19:50:52.1235060Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/a64operand.h:138:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:52.1238811Z 138 | ASMJIT_INLINE_NODEBUG constexpr bool isVecH8() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementH); } 2025-05-07T19:50:52.1240543Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:52.1241982Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecS4() const': 2025-05-07T19:50:52.1245147Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/a64operand.h:139:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:52.1248771Z 139 | ASMJIT_INLINE_NODEBUG constexpr bool isVecS4() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementS); } 2025-05-07T19:50:52.1250714Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:52.1252285Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecD2() const': 2025-05-07T19:50:52.1255348Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/a64operand.h:140:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:52.1257364Z 140 | ASMJIT_INLINE_NODEBUG constexpr bool isVecD2() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementD); } 2025-05-07T19:50:52.1258439Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:52.1259299Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecB4x4() const': 2025-05-07T19:50:52.1261066Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/a64operand.h:141:113: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:52.1263053Z 141 | ASMJIT_INLINE_NODEBUG constexpr bool isVecB4x4() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementB4); } 2025-05-07T19:50:52.1264356Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:52.1265222Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecH2x4() const': 2025-05-07T19:50:52.1266983Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/a64operand.h:142:113: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:52.1268972Z 142 | ASMJIT_INLINE_NODEBUG constexpr bool isVecH2x4() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementH2); } 2025-05-07T19:50:52.1270035Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:52.1270455Z At global scope: 2025-05-07T19:50:52.1271130Z cc1plus: note: unrecognized command-line option '-Wno-deprecated-anon-enum-enum-conversion' may have been intended to silence earlier diagnostics 2025-05-07T19:50:52.1297611Z [30/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/funcargscontext.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/funcargscontext.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/funcargscontext.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/funcargscontext.cpp 2025-05-07T19:50:52.1348225Z [31/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/archtraits.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/archtraits.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/archtraits.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/archtraits.cpp 2025-05-07T19:50:52.1353865Z In file included from /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/../arm/a64archtraits_p.h:13, 2025-05-07T19:50:52.1354560Z from /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/archtraits.cpp:16: 2025-05-07T19:50:52.1355548Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecB8() const': 2025-05-07T19:50:52.1357512Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/../arm/../arm/a64operand.h:132:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:52.1359498Z 132 | ASMJIT_INLINE_NODEBUG constexpr bool isVecB8() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementB); } 2025-05-07T19:50:52.1360562Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:52.1361420Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecH4() const': 2025-05-07T19:50:52.1363282Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/../arm/../arm/a64operand.h:133:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:52.1365278Z 133 | ASMJIT_INLINE_NODEBUG constexpr bool isVecH4() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementH); } 2025-05-07T19:50:52.1366329Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:52.1367169Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecS2() const': 2025-05-07T19:50:52.1368959Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/../arm/../arm/a64operand.h:134:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:52.1370945Z 134 | ASMJIT_INLINE_NODEBUG constexpr bool isVecS2() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementS); } 2025-05-07T19:50:52.1371995Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:52.1372832Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecD1() const': 2025-05-07T19:50:52.1374635Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/../arm/../arm/a64operand.h:135:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:52.1376535Z 135 | ASMJIT_INLINE_NODEBUG constexpr bool isVecD1() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature); } 2025-05-07T19:50:52.1377493Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:52.1378351Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecB16() const': 2025-05-07T19:50:52.1380134Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/../arm/../arm/a64operand.h:137:112: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:52.1382171Z 137 | ASMJIT_INLINE_NODEBUG constexpr bool isVecB16() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementB); } 2025-05-07T19:50:52.1383227Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:52.1384183Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecH8() const': 2025-05-07T19:50:52.1385947Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/../arm/../arm/a64operand.h:138:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:52.1388003Z 138 | ASMJIT_INLINE_NODEBUG constexpr bool isVecH8() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementH); } 2025-05-07T19:50:52.1389053Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:52.1400270Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecS4() const': 2025-05-07T19:50:52.1402081Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/../arm/../arm/a64operand.h:139:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:52.1404103Z 139 | ASMJIT_INLINE_NODEBUG constexpr bool isVecS4() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementS); } 2025-05-07T19:50:52.1405157Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:52.1406019Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecD2() const': 2025-05-07T19:50:52.1407791Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/../arm/../arm/a64operand.h:140:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:52.1410051Z 140 | ASMJIT_INLINE_NODEBUG constexpr bool isVecD2() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementD); } 2025-05-07T19:50:52.1411121Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:52.1411990Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecB4x4() const': 2025-05-07T19:50:52.1413765Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/../arm/../arm/a64operand.h:141:113: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:52.1415769Z 141 | ASMJIT_INLINE_NODEBUG constexpr bool isVecB4x4() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementB4); } 2025-05-07T19:50:52.1417028Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:52.1417881Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecH2x4() const': 2025-05-07T19:50:52.1419668Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/../arm/../arm/a64operand.h:142:113: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:52.1421663Z 142 | ASMJIT_INLINE_NODEBUG constexpr bool isVecH2x4() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementH2); } 2025-05-07T19:50:52.1422715Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:52.1423219Z At global scope: 2025-05-07T19:50:52.1423984Z cc1plus: note: unrecognized command-line option '-Wno-deprecated-anon-enum-enum-conversion' may have been intended to silence earlier diagnostics 2025-05-07T19:50:52.1553987Z [32/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/emithelper.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/emithelper.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/emithelper.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/emithelper.cpp 2025-05-07T19:50:52.1712859Z [33/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/assembler.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/assembler.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/assembler.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/assembler.cpp 2025-05-07T19:50:52.1868831Z [34/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/armformatter.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/armformatter.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/armformatter.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/armformatter.cpp 2025-05-07T19:50:52.1874681Z In file included from /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/armformatter.cpp:12: 2025-05-07T19:50:52.1875680Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecB8() const': 2025-05-07T19:50:52.1877570Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64operand.h:132:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:52.1879537Z 132 | ASMJIT_INLINE_NODEBUG constexpr bool isVecB8() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementB); } 2025-05-07T19:50:52.1880601Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:52.1881425Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecH4() const': 2025-05-07T19:50:52.1883170Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64operand.h:133:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:52.1885140Z 133 | ASMJIT_INLINE_NODEBUG constexpr bool isVecH4() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementH); } 2025-05-07T19:50:52.1886186Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:52.1887020Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecS2() const': 2025-05-07T19:50:52.1888755Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64operand.h:134:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:52.1890708Z 134 | ASMJIT_INLINE_NODEBUG constexpr bool isVecS2() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementS); } 2025-05-07T19:50:52.1891768Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:52.1892599Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecD1() const': 2025-05-07T19:50:52.1894329Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64operand.h:135:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:52.1896268Z 135 | ASMJIT_INLINE_NODEBUG constexpr bool isVecD1() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature); } 2025-05-07T19:50:52.1897256Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:52.1898083Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecB16() const': 2025-05-07T19:50:52.1899824Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64operand.h:137:112: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:52.1901832Z 137 | ASMJIT_INLINE_NODEBUG constexpr bool isVecB16() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementB); } 2025-05-07T19:50:52.1902878Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:52.1903806Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecH8() const': 2025-05-07T19:50:52.1905548Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64operand.h:138:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:52.1907513Z 138 | ASMJIT_INLINE_NODEBUG constexpr bool isVecH8() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementH); } 2025-05-07T19:50:52.1908564Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:52.1909671Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecS4() const': 2025-05-07T19:50:52.1911400Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64operand.h:139:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:52.1913367Z 139 | ASMJIT_INLINE_NODEBUG constexpr bool isVecS4() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementS); } 2025-05-07T19:50:52.1914431Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:52.1915266Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecD2() const': 2025-05-07T19:50:52.1916983Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64operand.h:140:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:52.1918947Z 140 | ASMJIT_INLINE_NODEBUG constexpr bool isVecD2() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementD); } 2025-05-07T19:50:52.1920003Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:52.1920934Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecB4x4() const': 2025-05-07T19:50:52.1922677Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64operand.h:141:113: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:52.1924658Z 141 | ASMJIT_INLINE_NODEBUG constexpr bool isVecB4x4() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementB4); } 2025-05-07T19:50:52.1925723Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:52.1926657Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecH2x4() const': 2025-05-07T19:50:52.1928406Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64operand.h:142:113: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:52.1930373Z 142 | ASMJIT_INLINE_NODEBUG constexpr bool isVecH2x4() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementH2); } 2025-05-07T19:50:52.1931443Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:52.1931806Z At global scope: 2025-05-07T19:50:52.1932474Z cc1plus: note: unrecognized command-line option '-Wno-deprecated-anon-enum-enum-conversion' may have been intended to silence earlier diagnostics 2025-05-07T19:50:52.1938052Z [35/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64builder.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64builder.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64builder.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64builder.cpp 2025-05-07T19:50:52.1943741Z In file included from /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/a64emitter.h:12, 2025-05-07T19:50:52.1944507Z from /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64assembler.h:10, 2025-05-07T19:50:52.1945119Z from /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64builder.cpp:9: 2025-05-07T19:50:52.1946075Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecB8() const': 2025-05-07T19:50:52.1947884Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h:132:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:52.1949948Z 132 | ASMJIT_INLINE_NODEBUG constexpr bool isVecB8() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementB); } 2025-05-07T19:50:52.1951000Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:52.1951865Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecH4() const': 2025-05-07T19:50:52.1953655Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h:133:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:52.1955724Z 133 | ASMJIT_INLINE_NODEBUG constexpr bool isVecH4() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementH); } 2025-05-07T19:50:52.1956783Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:52.1957650Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecS2() const': 2025-05-07T19:50:52.1959444Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h:134:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:52.1961454Z 134 | ASMJIT_INLINE_NODEBUG constexpr bool isVecS2() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementS); } 2025-05-07T19:50:52.1962506Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:52.1963359Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecD1() const': 2025-05-07T19:50:52.1965166Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h:135:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:52.1967097Z 135 | ASMJIT_INLINE_NODEBUG constexpr bool isVecD1() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature); } 2025-05-07T19:50:52.1968075Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:52.1968979Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecB16() const': 2025-05-07T19:50:52.1970820Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h:137:112: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:52.1972836Z 137 | ASMJIT_INLINE_NODEBUG constexpr bool isVecB16() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementB); } 2025-05-07T19:50:52.1973982Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:52.1974884Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecH8() const': 2025-05-07T19:50:52.1976693Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h:138:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:52.1978728Z 138 | ASMJIT_INLINE_NODEBUG constexpr bool isVecH8() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementH); } 2025-05-07T19:50:52.1979866Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:52.1980768Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecS4() const': 2025-05-07T19:50:52.1982576Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h:139:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:52.1984686Z 139 | ASMJIT_INLINE_NODEBUG constexpr bool isVecS4() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementS); } 2025-05-07T19:50:52.1985789Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:52.1986661Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecD2() const': 2025-05-07T19:50:52.1988500Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h:140:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:52.1990540Z 140 | ASMJIT_INLINE_NODEBUG constexpr bool isVecD2() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementD); } 2025-05-07T19:50:52.1991608Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:52.1992518Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecB4x4() const': 2025-05-07T19:50:52.1994367Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h:141:113: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:52.1996401Z 141 | ASMJIT_INLINE_NODEBUG constexpr bool isVecB4x4() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementB4); } 2025-05-07T19:50:52.1997474Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:52.1998445Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecH2x4() const': 2025-05-07T19:50:52.2000264Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h:142:113: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:52.2002300Z 142 | ASMJIT_INLINE_NODEBUG constexpr bool isVecH2x4() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementH2); } 2025-05-07T19:50:52.2003382Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:52.2003745Z At global scope: 2025-05-07T19:50:52.2005919Z cc1plus: note: unrecognized command-line option '-Wno-deprecated-anon-enum-enum-conversion' may have been intended to silence earlier diagnostics 2025-05-07T19:50:52.2011772Z [36/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86operand.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86operand.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86operand.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86operand.cpp 2025-05-07T19:50:52.2021889Z [37/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/formatter.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/formatter.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/formatter.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/formatter.cpp 2025-05-07T19:50:52.2281622Z [38/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64instdb.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64instdb.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64instdb.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64instdb.cpp 2025-05-07T19:50:52.2292308Z In file included from /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64instdb_p.h:12, 2025-05-07T19:50:52.2293637Z from /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64instdb.cpp:11: 2025-05-07T19:50:52.2295345Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecB8() const': 2025-05-07T19:50:52.2298901Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/a64operand.h:132:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:52.2302785Z 132 | ASMJIT_INLINE_NODEBUG constexpr bool isVecB8() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementB); } 2025-05-07T19:50:52.2304973Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:52.2306543Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecH4() const': 2025-05-07T19:50:52.2310164Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/a64operand.h:133:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:52.2314011Z 133 | ASMJIT_INLINE_NODEBUG constexpr bool isVecH4() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementH); } 2025-05-07T19:50:52.2315943Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:52.2317495Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecS2() const': 2025-05-07T19:50:52.2320919Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/a64operand.h:134:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:52.2324708Z 134 | ASMJIT_INLINE_NODEBUG constexpr bool isVecS2() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementS); } 2025-05-07T19:50:52.2326685Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:52.2328229Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecD1() const': 2025-05-07T19:50:52.2331484Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/a64operand.h:135:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:52.2335356Z 135 | ASMJIT_INLINE_NODEBUG constexpr bool isVecD1() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature); } 2025-05-07T19:50:52.2337210Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:52.2338852Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecB16() const': 2025-05-07T19:50:52.2342071Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/a64operand.h:137:112: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:52.2346024Z 137 | ASMJIT_INLINE_NODEBUG constexpr bool isVecB16() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementB); } 2025-05-07T19:50:52.2348046Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:52.2349688Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecH8() const': 2025-05-07T19:50:52.2353022Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/a64operand.h:138:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:52.2356419Z 138 | ASMJIT_INLINE_NODEBUG constexpr bool isVecH8() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementH); } 2025-05-07T19:50:52.2358174Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:52.2359677Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecS4() const': 2025-05-07T19:50:52.2362953Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/a64operand.h:139:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:52.2366763Z 139 | ASMJIT_INLINE_NODEBUG constexpr bool isVecS4() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementS); } 2025-05-07T19:50:52.2368818Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:52.2370484Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecD2() const': 2025-05-07T19:50:52.2373947Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/a64operand.h:140:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:52.2377884Z 140 | ASMJIT_INLINE_NODEBUG constexpr bool isVecD2() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementD); } 2025-05-07T19:50:52.2380146Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:52.2381819Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecB4x4() const': 2025-05-07T19:50:52.2385416Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/a64operand.h:141:113: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:52.2389327Z 141 | ASMJIT_INLINE_NODEBUG constexpr bool isVecB4x4() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementB4); } 2025-05-07T19:50:52.2391530Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:52.2393222Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecH2x4() const': 2025-05-07T19:50:52.2396712Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/a64operand.h:142:113: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:52.2400613Z 142 | ASMJIT_INLINE_NODEBUG constexpr bool isVecH2x4() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementH2); } 2025-05-07T19:50:52.2402691Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:52.2403366Z At global scope: 2025-05-07T19:50:52.2404622Z cc1plus: note: unrecognized command-line option '-Wno-deprecated-anon-enum-enum-conversion' may have been intended to silence earlier diagnostics 2025-05-07T19:50:52.2666077Z [39/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/jitallocator.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/jitallocator.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/jitallocator.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/jitallocator.cpp 2025-05-07T19:50:52.2799496Z [40/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64compiler.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64compiler.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64compiler.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64compiler.cpp 2025-05-07T19:50:52.2810854Z In file included from /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/a64emitter.h:12, 2025-05-07T19:50:52.2812253Z from /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64assembler.h:10, 2025-05-07T19:50:52.2813441Z from /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64compiler.cpp:9: 2025-05-07T19:50:52.2815470Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecB8() const': 2025-05-07T19:50:52.2818875Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h:132:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:52.2822865Z 132 | ASMJIT_INLINE_NODEBUG constexpr bool isVecB8() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementB); } 2025-05-07T19:50:52.2824996Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:52.2826679Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecH4() const': 2025-05-07T19:50:52.2829982Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h:133:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:52.2833793Z 133 | ASMJIT_INLINE_NODEBUG constexpr bool isVecH4() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementH); } 2025-05-07T19:50:52.2835640Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:52.2837322Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecS2() const': 2025-05-07T19:50:52.2840884Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h:134:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:52.2844472Z 134 | ASMJIT_INLINE_NODEBUG constexpr bool isVecS2() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementS); } 2025-05-07T19:50:52.2846343Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:52.2848013Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecD1() const': 2025-05-07T19:50:52.2851603Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h:135:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:52.2855410Z 135 | ASMJIT_INLINE_NODEBUG constexpr bool isVecD1() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature); } 2025-05-07T19:50:52.2857298Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:52.2858957Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecB16() const': 2025-05-07T19:50:52.2862644Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h:137:112: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:52.2866575Z 137 | ASMJIT_INLINE_NODEBUG constexpr bool isVecB16() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementB); } 2025-05-07T19:50:52.2868576Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:52.2870289Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecH8() const': 2025-05-07T19:50:52.2873803Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h:138:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:52.2877670Z 138 | ASMJIT_INLINE_NODEBUG constexpr bool isVecH8() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementH); } 2025-05-07T19:50:52.2879690Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:52.2881337Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecS4() const': 2025-05-07T19:50:52.2884868Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h:139:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:52.2888810Z 139 | ASMJIT_INLINE_NODEBUG constexpr bool isVecS4() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementS); } 2025-05-07T19:50:52.2890877Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:52.2892534Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecD2() const': 2025-05-07T19:50:52.2896102Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h:140:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:52.2900181Z 140 | ASMJIT_INLINE_NODEBUG constexpr bool isVecD2() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementD); } 2025-05-07T19:50:52.2902250Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:52.2904017Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecB4x4() const': 2025-05-07T19:50:52.2907326Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h:141:113: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:52.2911712Z 141 | ASMJIT_INLINE_NODEBUG constexpr bool isVecB4x4() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementB4); } 2025-05-07T19:50:52.2913842Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:52.2915484Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecH2x4() const': 2025-05-07T19:50:52.2919056Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h:142:113: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:52.2923041Z 142 | ASMJIT_INLINE_NODEBUG constexpr bool isVecH2x4() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementH2); } 2025-05-07T19:50:52.2925161Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:52.2925838Z At global scope: 2025-05-07T19:50:52.2927088Z cc1plus: note: unrecognized command-line option '-Wno-deprecated-anon-enum-enum-conversion' may have been intended to silence earlier diagnostics 2025-05-07T19:50:52.3031985Z [41/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/virtmem.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/virtmem.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/virtmem.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/virtmem.cpp 2025-05-07T19:50:52.3218512Z [42/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/compiler.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/compiler.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/compiler.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/compiler.cpp 2025-05-07T19:50:52.3344551Z [43/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/builder.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/builder.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/builder.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/builder.cpp 2025-05-07T19:50:52.3401522Z [44/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64emithelper.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64emithelper.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64emithelper.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64emithelper.cpp 2025-05-07T19:50:52.3412815Z In file included from /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/a64emitter.h:12, 2025-05-07T19:50:52.3414215Z from /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64emithelper_p.h:13, 2025-05-07T19:50:52.3415398Z from /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64emithelper.cpp:14: 2025-05-07T19:50:52.3417106Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecB8() const': 2025-05-07T19:50:52.3420348Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h:132:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:52.3424337Z 132 | ASMJIT_INLINE_NODEBUG constexpr bool isVecB8() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementB); } 2025-05-07T19:50:52.3426236Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:52.3427802Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecH4() const': 2025-05-07T19:50:52.3431013Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h:133:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:52.3434584Z 133 | ASMJIT_INLINE_NODEBUG constexpr bool isVecH4() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementH); } 2025-05-07T19:50:52.3436630Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:52.3438132Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecS2() const': 2025-05-07T19:50:52.3441559Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h:134:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:52.3445215Z 134 | ASMJIT_INLINE_NODEBUG constexpr bool isVecS2() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementS); } 2025-05-07T19:50:52.3447023Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:52.3448677Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecD1() const': 2025-05-07T19:50:52.3452051Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h:135:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:52.3455388Z 135 | ASMJIT_INLINE_NODEBUG constexpr bool isVecD1() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature); } 2025-05-07T19:50:52.3457155Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:52.3458581Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecB16() const': 2025-05-07T19:50:52.3461703Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h:137:112: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:52.3465732Z 137 | ASMJIT_INLINE_NODEBUG constexpr bool isVecB16() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementB); } 2025-05-07T19:50:52.3467663Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:52.3469319Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecH8() const': 2025-05-07T19:50:52.3472443Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h:138:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:52.3475936Z 138 | ASMJIT_INLINE_NODEBUG constexpr bool isVecH8() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementH); } 2025-05-07T19:50:52.3477782Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:52.3479390Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecS4() const': 2025-05-07T19:50:52.3482542Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h:139:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:52.3486018Z 139 | ASMJIT_INLINE_NODEBUG constexpr bool isVecS4() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementS); } 2025-05-07T19:50:52.3488022Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:52.3489675Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecD2() const': 2025-05-07T19:50:52.3493145Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h:140:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:52.3497041Z 140 | ASMJIT_INLINE_NODEBUG constexpr bool isVecD2() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementD); } 2025-05-07T19:50:52.3499069Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:52.3500704Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecB4x4() const': 2025-05-07T19:50:52.3504081Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h:141:113: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:52.3507643Z 141 | ASMJIT_INLINE_NODEBUG constexpr bool isVecB4x4() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementB4); } 2025-05-07T19:50:52.3509625Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:52.3511181Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecH2x4() const': 2025-05-07T19:50:52.3514884Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h:142:113: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:52.3518664Z 142 | ASMJIT_INLINE_NODEBUG constexpr bool isVecH2x4() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementH2); } 2025-05-07T19:50:52.3520508Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:52.3521140Z At global scope: 2025-05-07T19:50:52.3522365Z cc1plus: note: unrecognized command-line option '-Wno-deprecated-anon-enum-enum-conversion' may have been intended to silence earlier diagnostics 2025-05-07T19:50:52.3681916Z [45/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/cpuinfo.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/cpuinfo.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/cpuinfo.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/cpuinfo.cpp 2025-05-07T19:50:52.4245725Z [46/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86formatter.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86formatter.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86formatter.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86formatter.cpp 2025-05-07T19:50:52.5297256Z [47/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86instdb.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86instdb.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86instdb.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86instdb.cpp 2025-05-07T19:50:52.5660380Z [48/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/ralocal.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/ralocal.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/ralocal.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/ralocal.cpp 2025-05-07T19:50:52.6980547Z [49/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86instapi.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86instapi.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86instapi.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86instapi.cpp 2025-05-07T19:50:52.7941034Z [50/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/codeholder.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/codeholder.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/codeholder.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/codeholder.cpp 2025-05-07T19:50:52.8871637Z [51/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -mavx2 -mf16c -mfma -fopenmp -MD -MT CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/EmbeddingSpMDMAvx2.cc.o -MF CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/EmbeddingSpMDMAvx2.cc.o.d -o CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/EmbeddingSpMDMAvx2.cc.o -c /__w/FBGEMM/FBGEMM/src/EmbeddingSpMDMAvx2.cc 2025-05-07T19:50:52.9244869Z [52/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64rapass.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64rapass.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64rapass.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64rapass.cpp 2025-05-07T19:50:52.9253079Z In file included from /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/a64emitter.h:12, 2025-05-07T19:50:52.9254091Z from /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64assembler.h:10, 2025-05-07T19:50:52.9254986Z from /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64rapass.cpp:12: 2025-05-07T19:50:52.9256325Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecB8() const': 2025-05-07T19:50:52.9258774Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h:132:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:52.9261616Z 132 | ASMJIT_INLINE_NODEBUG constexpr bool isVecB8() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementB); } 2025-05-07T19:50:52.9263093Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:52.9264744Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecH4() const': 2025-05-07T19:50:52.9267309Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h:133:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:52.9270291Z 133 | ASMJIT_INLINE_NODEBUG constexpr bool isVecH4() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementH); } 2025-05-07T19:50:52.9271884Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:52.9275368Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecS2() const': 2025-05-07T19:50:52.9277956Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h:134:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:52.9280759Z 134 | ASMJIT_INLINE_NODEBUG constexpr bool isVecS2() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementS); } 2025-05-07T19:50:52.9282212Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:52.9283447Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecD1() const': 2025-05-07T19:50:52.9285950Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h:135:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:52.9288624Z 135 | ASMJIT_INLINE_NODEBUG constexpr bool isVecD1() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature); } 2025-05-07T19:50:52.9290004Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:52.9291238Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecB16() const': 2025-05-07T19:50:52.9293760Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h:137:112: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:52.9296520Z 137 | ASMJIT_INLINE_NODEBUG constexpr bool isVecB16() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementB); } 2025-05-07T19:50:52.9298002Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:52.9299182Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecH8() const': 2025-05-07T19:50:52.9301878Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h:138:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:52.9304777Z 138 | ASMJIT_INLINE_NODEBUG constexpr bool isVecH8() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementH); } 2025-05-07T19:50:52.9306231Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:52.9307445Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecS4() const': 2025-05-07T19:50:52.9310471Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h:139:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:52.9313334Z 139 | ASMJIT_INLINE_NODEBUG constexpr bool isVecS4() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementS); } 2025-05-07T19:50:52.9314806Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:52.9316071Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecD2() const': 2025-05-07T19:50:52.9318614Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h:140:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:52.9321410Z 140 | ASMJIT_INLINE_NODEBUG constexpr bool isVecD2() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementD); } 2025-05-07T19:50:52.9322884Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:52.9324115Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecB4x4() const': 2025-05-07T19:50:52.9326633Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h:141:113: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:52.9329446Z 141 | ASMJIT_INLINE_NODEBUG constexpr bool isVecB4x4() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementB4); } 2025-05-07T19:50:52.9330950Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:52.9332148Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecH2x4() const': 2025-05-07T19:50:52.9334670Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h:142:113: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:52.9337718Z 142 | ASMJIT_INLINE_NODEBUG constexpr bool isVecH2x4() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementH2); } 2025-05-07T19:50:52.9339237Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:52.9339726Z At global scope: 2025-05-07T19:50:52.9340683Z cc1plus: note: unrecognized command-line option '-Wno-deprecated-anon-enum-enum-conversion' may have been intended to silence earlier diagnostics 2025-05-07T19:50:53.0865887Z [53/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/rapass.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/rapass.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/rapass.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/rapass.cpp 2025-05-07T19:50:53.0955900Z [54/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -mavx2 -mf16c -mfma -mavx512f -mavx512bw -mavx512dq -mavx512vl -fopenmp -MD -MT CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/EmbeddingSpMDMAvx512.cc.o -MF CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/EmbeddingSpMDMAvx512.cc.o.d -o CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/EmbeddingSpMDMAvx512.cc.o -c /__w/FBGEMM/FBGEMM/src/EmbeddingSpMDMAvx512.cc 2025-05-07T19:50:53.2681400Z [55/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86func.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86func.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86func.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86func.cpp 2025-05-07T19:50:53.3305425Z [56/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86emithelper.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86emithelper.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86emithelper.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86emithelper.cpp 2025-05-07T19:50:53.3744171Z [57/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86builder.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86builder.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86builder.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86builder.cpp 2025-05-07T19:50:53.4158979Z [58/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86compiler.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86compiler.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86compiler.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86compiler.cpp 2025-05-07T19:50:53.4247755Z [59/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/QuantUtils.cc.o -MF CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/QuantUtils.cc.o.d -o CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/QuantUtils.cc.o -c /__w/FBGEMM/FBGEMM/src/QuantUtils.cc 2025-05-07T19:50:53.9083064Z [60/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64assembler.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64assembler.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64assembler.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64assembler.cpp 2025-05-07T19:50:53.9092282Z In file included from /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/a64emitter.h:12, 2025-05-07T19:50:53.9093506Z from /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/a64assembler.h:10, 2025-05-07T19:50:53.9094420Z from /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64assembler.cpp:18: 2025-05-07T19:50:53.9096041Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecB8() const': 2025-05-07T19:50:53.9098985Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h:132:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:53.9102615Z 132 | ASMJIT_INLINE_NODEBUG constexpr bool isVecB8() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementB); } 2025-05-07T19:50:53.9104438Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:53.9105885Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecH4() const': 2025-05-07T19:50:53.9108975Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h:133:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:53.9112557Z 133 | ASMJIT_INLINE_NODEBUG constexpr bool isVecH4() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementH); } 2025-05-07T19:50:53.9114248Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:53.9115804Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecS2() const': 2025-05-07T19:50:53.9118723Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h:134:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:53.9122054Z 134 | ASMJIT_INLINE_NODEBUG constexpr bool isVecS2() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementS); } 2025-05-07T19:50:53.9123709Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:53.9125086Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecD1() const': 2025-05-07T19:50:53.9128013Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h:135:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:53.9131085Z 135 | ASMJIT_INLINE_NODEBUG constexpr bool isVecD1() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature); } 2025-05-07T19:50:53.9132641Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:53.9134046Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecB16() const': 2025-05-07T19:50:53.9136920Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h:137:112: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:53.9140226Z 137 | ASMJIT_INLINE_NODEBUG constexpr bool isVecB16() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementB); } 2025-05-07T19:50:53.9142164Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:53.9143658Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecH8() const': 2025-05-07T19:50:53.9146558Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h:138:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:53.9149707Z 138 | ASMJIT_INLINE_NODEBUG constexpr bool isVecH8() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementH); } 2025-05-07T19:50:53.9151619Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:53.9153052Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecS4() const': 2025-05-07T19:50:53.9155905Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h:139:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:53.9159094Z 139 | ASMJIT_INLINE_NODEBUG constexpr bool isVecS4() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementS); } 2025-05-07T19:50:53.9160774Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:53.9162159Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecD2() const': 2025-05-07T19:50:53.9165052Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h:140:111: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:53.9168250Z 140 | ASMJIT_INLINE_NODEBUG constexpr bool isVecD2() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementD); } 2025-05-07T19:50:53.9169900Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:53.9171361Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecB4x4() const': 2025-05-07T19:50:53.9174264Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h:141:113: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:53.9177499Z 141 | ASMJIT_INLINE_NODEBUG constexpr bool isVecB4x4() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementB4); } 2025-05-07T19:50:53.9179133Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:53.9180787Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h: In member function 'constexpr bool asmjit::_abi_1_13::a64::Vec::isVecH2x4() const': 2025-05-07T19:50:53.9183887Z /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/../arm/../arm/../arm/a64operand.h:142:113: warning: bitwise operation between different enumeration types 'asmjit::_abi_1_13::BaseReg::' and 'asmjit::_abi_1_13::arm::BaseVec::AdditionalBits' is deprecated [-Wdeprecated-enum-enum-conversion] 2025-05-07T19:50:53.9187117Z 142 | ASMJIT_INLINE_NODEBUG constexpr bool isVecH2x4() const noexcept { return _signature.subset(kBaseSignatureMask | kSignatureRegElementTypeMask) == (RegTraits::kSignature | kSignatureElementH2); } 2025-05-07T19:50:53.9188796Z | ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2025-05-07T19:50:53.9189351Z At global scope: 2025-05-07T19:50:53.9190656Z cc1plus: note: unrecognized command-line option '-Wno-deprecated-anon-enum-enum-conversion' may have been intended to silence earlier diagnostics 2025-05-07T19:50:54.4017530Z [61/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86rapass.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86rapass.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86rapass.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86rapass.cpp 2025-05-07T19:50:54.7038074Z [62/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/Utils.cc.o -MF CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/Utils.cc.o.d -o CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/Utils.cc.o -c /__w/FBGEMM/FBGEMM/src/Utils.cc 2025-05-07T19:50:55.3522952Z [63/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dasmjit_EXPORTS -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86assembler.cpp.o -MF CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86assembler.cpp.o.d -o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86assembler.cpp.o -c /__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86assembler.cpp 2025-05-07T19:50:55.9260573Z [64/156] : && /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -fPIC -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -Wl,-O2 -Wl,--sort-common -Wl,--as-needed -Wl,-z,relro -Wl,-z,now -Wl,--disable-new-dtags -Wl,--gc-sections -Wl,--allow-shlib-undefined -Wl,-rpath,/github/home/miniconda/envs/build_binary/lib -Wl,-rpath-link,/github/home/miniconda/envs/build_binary/lib -L/github/home/miniconda/envs/build_binary/lib -L/github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib -L/github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/stubs -s -shared -Wl,-soname,asmjit.so -o asmjit.so CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64assembler.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64builder.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64compiler.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64emithelper.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64formatter.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64func.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64instapi.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64instdb.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64operand.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/a64rapass.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/arm/armformatter.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/archtraits.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/assembler.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/builder.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/codeholder.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/codewriter.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/compiler.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/constpool.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/cpuinfo.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/emithelper.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/emitter.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/emitterutils.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/environment.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/errorhandler.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/formatter.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/func.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/funcargscontext.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/globals.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/inst.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/instdb.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/jitallocator.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/jitruntime.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/logger.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/operand.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/osutils.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/ralocal.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/rapass.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/rastack.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/string.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/support.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/target.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/type.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/virtmem.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zone.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zonehash.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zonelist.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zonestack.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zonetree.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/core/zonevector.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86assembler.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86builder.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86compiler.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86emithelper.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86formatter.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86func.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86instapi.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86instdb.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86operand.cpp.o CMakeFiles/asmjit.dir/__w/FBGEMM/FBGEMM/external/asmjit/src/asmjit/x86/x86rapass.cpp.o -L/lib/intel64 -L/lib/intel64_win -L/lib/win-x64 -Wl,-rpath,/lib/intel64:/lib/intel64_win:/lib/win-x64:/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/lib:/github/home/miniconda/envs/build_binary/lib/stubs: /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/lib/libtorch.so /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/lib/libc10.so /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/libnvrtc.so /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/lib/libc10_cuda.so /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/lib/libnccl.so.2 /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/stubs/libcuda.so /github/home/miniconda/envs/build_binary/lib/stubs/libnvidia-ml.so -Wl,--no-as-needed,"/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/lib/libtorch_cpu.so" -Wl,--as-needed -Wl,--no-as-needed,"/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/lib/libtorch_cuda.so" -Wl,--as-needed /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/lib/libc10_cuda.so /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/lib/libc10.so /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/libcudart.so -Wl,--no-as-needed,"/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/lib/libtorch.so" -Wl,--as-needed && : 2025-05-07T19:50:55.9330107Z [65/156] cd /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.12/cmake-build && bash /__w/FBGEMM/FBGEMM/fbgemm_gpu/../.github/scripts/fbgemm_gpu_postbuild.bash /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.12/cmake-build/asmjit.so 2025-05-07T19:50:55.9332168Z ################################################################################ 2025-05-07T19:50:55.9332788Z [CMAKE] Running post-build script ... 2025-05-07T19:50:55.9333609Z Target file: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.12/cmake-build/asmjit.so 2025-05-07T19:50:55.9334513Z Removing all RPATHs ... 2025-05-07T19:50:55.9335286Z ################################################################################ 2025-05-07T19:50:56.0480678Z [66/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/SparseAdagrad.cc.o -MF CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/SparseAdagrad.cc.o.d -o CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/SparseAdagrad.cc.o -c /__w/FBGEMM/FBGEMM/src/SparseAdagrad.cc 2025-05-07T19:50:58.2745971Z [67/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/RefImplementations.cc.o -MF CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/RefImplementations.cc.o.d -o CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/RefImplementations.cc.o -c /__w/FBGEMM/FBGEMM/src/RefImplementations.cc 2025-05-07T19:51:00.3756985Z [68/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/RowWiseSparseAdagradFused.cc.o -MF CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/RowWiseSparseAdagradFused.cc.o.d -o CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/RowWiseSparseAdagradFused.cc.o -c /__w/FBGEMM/FBGEMM/src/RowWiseSparseAdagradFused.cc 2025-05-07T19:51:01.4890108Z [69/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -mavx2 -mf16c -mfma -fopenmp -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/gather_scatter/gather_scatter.cpp.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/gather_scatter/gather_scatter.cpp.o.d -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/gather_scatter/gather_scatter.cpp.o -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/gather_scatter/gather_scatter.cpp 2025-05-07T19:51:01.7173748Z [70/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -mavx2 -mf16c -mfma -fopenmp -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/moe/index_shuffling.cpp.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/moe/index_shuffling.cpp.o.d -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/moe/index_shuffling.cpp.o -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/moe/index_shuffling.cpp 2025-05-07T19:51:01.8544903Z [71/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -mavx2 -mf16c -mfma -fopenmp -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/attention/attention.cpp.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/attention/attention.cpp.o.d -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/attention/attention.cpp.o -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/attention/attention.cpp 2025-05-07T19:51:02.0024147Z [72/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/EmbeddingSpMDMAutovec.cc.o -MF CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/EmbeddingSpMDMAutovec.cc.o.d -o CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/EmbeddingSpMDMAutovec.cc.o -c /__w/FBGEMM/FBGEMM/src/EmbeddingSpMDMAutovec.cc 2025-05-07T19:51:02.0454991Z [73/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -mavx2 -mf16c -mfma -fopenmp -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/coalesce/coalesce.cpp.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/coalesce/coalesce.cpp.o.d -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/coalesce/coalesce.cpp.o -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/coalesce/coalesce.cpp 2025-05-07T19:51:03.5400404Z [74/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -mavx2 -mf16c -mfma -fopenmp -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/kv_cache/kv_cache.cpp.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/kv_cache/kv_cache.cpp.o.d -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/kv_cache/kv_cache.cpp.o -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/kv_cache/kv_cache.cpp 2025-05-07T19:51:04.6121506Z [75/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -mavx2 -mf16c -mfma -fopenmp -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/quantize.cpp.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/quantize.cpp.o.d -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/quantize.cpp.o -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/quantize.cpp 2025-05-07T19:51:05.0675557Z [76/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -mavx2 -mf16c -mfma -fopenmp -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/comm/car.cpp.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/comm/car.cpp.o.d -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/comm/car.cpp.o -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/comm/car.cpp 2025-05-07T19:51:08.8702887Z [77/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -mavx2 -mf16c -mfma -fopenmp -MD -MT CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/QuantUtilsAvx2.cc.o -MF CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/QuantUtilsAvx2.cc.o.d -o CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/QuantUtilsAvx2.cc.o -c /__w/FBGEMM/FBGEMM/src/QuantUtilsAvx2.cc 2025-05-07T19:51:20.1005421Z [78/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/EmbeddingSpMDMNBit.cc.o -MF CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/EmbeddingSpMDMNBit.cc.o.d -o CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/EmbeddingSpMDMNBit.cc.o -c /__w/FBGEMM/FBGEMM/src/EmbeddingSpMDMNBit.cc 2025-05-07T19:52:02.4311984Z [79/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/EmbeddingSpMDM.cc.o -MF CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/EmbeddingSpMDM.cc.o.d -o CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/EmbeddingSpMDM.cc.o -c /__w/FBGEMM/FBGEMM/src/EmbeddingSpMDM.cc 2025-05-07T19:52:03.4333535Z [80/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16.cu.o 2025-05-07T19:52:03.4353963Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:52:05.2517734Z [81/156] : && /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -fPIC -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -Wl,-O2 -Wl,--sort-common -Wl,--as-needed -Wl,-z,relro -Wl,-z,now -Wl,--disable-new-dtags -Wl,--gc-sections -Wl,--allow-shlib-undefined -Wl,-rpath,/github/home/miniconda/envs/build_binary/lib -Wl,-rpath-link,/github/home/miniconda/envs/build_binary/lib -L/github/home/miniconda/envs/build_binary/lib -L/github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib -L/github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/stubs -s -shared -Wl,-soname,fbgemm.so -o fbgemm.so CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/EmbeddingSpMDM.cc.o CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/EmbeddingSpMDMAutovec.cc.o CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/EmbeddingSpMDMNBit.cc.o CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/QuantUtils.cc.o CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/RefImplementations.cc.o CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/RowWiseSparseAdagradFused.cc.o CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/SparseAdagrad.cc.o CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/Utils.cc.o CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/EmbeddingSpMDMAvx2.cc.o CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/QuantUtilsAvx2.cc.o CMakeFiles/fbgemm.dir/__w/FBGEMM/FBGEMM/src/EmbeddingSpMDMAvx512.cc.o -L/lib/intel64 -L/lib/intel64_win -L/lib/win-x64 -Wl,-rpath,"\$ORIGIN" /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/lib/libc10.so /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/libnvrtc.so /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/lib/libc10_cuda.so /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/lib/libnccl.so.2 /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/stubs/libcuda.so asmjit.so /github/home/miniconda/envs/build_binary/lib/stubs/libnvidia-ml.so -Wl,--no-as-needed,"/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/lib/libtorch.so" -Wl,--as-needed /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/lib/libtorch.so -Wl,--no-as-needed,"/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/lib/libtorch_cpu.so" -Wl,--as-needed -Wl,--no-as-needed,"/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/lib/libtorch_cuda.so" -Wl,--as-needed /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/lib/libc10_cuda.so /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/lib/libc10.so /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/libcudart.so && : 2025-05-07T19:52:06.0987149Z [82/156] cd /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.12/cmake-build && bash /__w/FBGEMM/FBGEMM/fbgemm_gpu/../.github/scripts/fbgemm_gpu_postbuild.bash /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.12/cmake-build/fbgemm.so 1 2025-05-07T19:52:06.0989167Z ################################################################################ 2025-05-07T19:52:06.0989808Z [CMAKE] Running post-build script ... 2025-05-07T19:52:06.0990649Z Target file: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.12/cmake-build/fbgemm.so 2025-05-07T19:52:06.0991563Z Resetting RPATH to $ORIGIN ... 2025-05-07T19:52:06.0992177Z 0x000000000000000f (RPATH) Library rpath: [$ORIGIN] 2025-05-07T19:52:06.0992938Z ################################################################################ 2025-05-07T19:52:12.4770895Z [83/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise.cu.o 2025-05-07T19:52:12.4793275Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:52:13.2901951Z [84/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_cublas.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_cublas.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_cublas.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_cublas.cu.o 2025-05-07T19:52:13.2923039Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:52:13.3523124Z [85/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/coalesce/coalesce.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/coalesce/coalesce.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/coalesce/coalesce.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/coalesce/coalesce.cu.o 2025-05-07T19:52:13.3541679Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:56:53.7864421Z [86/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_lite.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_lite.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_lite.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_lite.cu.o 2025-05-07T19:56:53.8007817Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:57:01.6614383Z [87/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/attention/gqa_attn_splitk.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/attention/gqa_attn_splitk.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/attention/gqa_attn_splitk.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/attention/gqa_attn_splitk.cu.o 2025-05-07T19:57:01.6626395Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:57:17.3957525Z [88/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/gather_scatter/gather_scatter.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/gather_scatter/gather_scatter.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/gather_scatter/gather_scatter.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/gather_scatter/gather_scatter.cu.o 2025-05-07T19:57:17.3979937Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:57:17.3982859Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/gather_scatter/gather_scatter.cu(202): warning #177-D: function "fbgemm_gpu::::TorchDTypeTrait::dtype" was declared but never referenced 2025-05-07T19:57:17.3984706Z static auto dtype() { 2025-05-07T19:57:17.3985161Z ^ 2025-05-07T19:57:17.3985445Z 2025-05-07T19:57:17.3985874Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:57:17.3986521Z 2025-05-07T19:57:17.3987916Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/gather_scatter/gather_scatter.cu(195): warning #177-D: function "fbgemm_gpu::::TorchDTypeTrait::dtype" was declared but never referenced 2025-05-07T19:57:17.3989956Z static auto dtype() { 2025-05-07T19:57:17.3990429Z ^ 2025-05-07T19:57:17.3990670Z 2025-05-07T19:57:17.3992204Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/gather_scatter/gather_scatter.cu(188): warning #177-D: function "fbgemm_gpu::::TorchDTypeTrait::dtype" was declared but never referenced 2025-05-07T19:57:17.3994125Z static auto dtype() { 2025-05-07T19:57:17.3994569Z ^ 2025-05-07T19:57:17.3994829Z 2025-05-07T19:57:17.3996245Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/gather_scatter/gather_scatter.cu(202): warning #177-D: function "fbgemm_gpu::::TorchDTypeTrait::dtype" was declared but never referenced 2025-05-07T19:57:17.3998045Z static auto dtype() { 2025-05-07T19:57:17.3998491Z ^ 2025-05-07T19:57:17.3998754Z 2025-05-07T19:57:17.3999198Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:57:17.3999839Z 2025-05-07T19:57:17.4001492Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/gather_scatter/gather_scatter.cu(195): warning #177-D: function "fbgemm_gpu::::TorchDTypeTrait::dtype" was declared but never referenced 2025-05-07T19:57:17.4003258Z static auto dtype() { 2025-05-07T19:57:17.4003733Z ^ 2025-05-07T19:57:17.4003982Z 2025-05-07T19:57:17.4005515Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/gather_scatter/gather_scatter.cu(188): warning #177-D: function "fbgemm_gpu::::TorchDTypeTrait::dtype" was declared but never referenced 2025-05-07T19:57:17.4007451Z static auto dtype() { 2025-05-07T19:57:17.4007914Z ^ 2025-05-07T19:57:17.4008145Z 2025-05-07T19:57:17.4009763Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/gather_scatter/gather_scatter.cu(202): warning #177-D: function "fbgemm_gpu::::TorchDTypeTrait::dtype" was declared but never referenced 2025-05-07T19:57:17.4011577Z static auto dtype() { 2025-05-07T19:57:17.4012039Z ^ 2025-05-07T19:57:17.4012309Z 2025-05-07T19:57:17.4012744Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:57:17.4013401Z 2025-05-07T19:57:17.4014879Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/gather_scatter/gather_scatter.cu(195): warning #177-D: function "fbgemm_gpu::::TorchDTypeTrait::dtype" was declared but never referenced 2025-05-07T19:57:17.4016671Z static auto dtype() { 2025-05-07T19:57:17.4017140Z ^ 2025-05-07T19:57:17.4017382Z 2025-05-07T19:57:17.4018845Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/gather_scatter/gather_scatter.cu(188): warning #177-D: function "fbgemm_gpu::::TorchDTypeTrait::dtype" was declared but never referenced 2025-05-07T19:57:17.4020708Z static auto dtype() { 2025-05-07T19:57:17.4021181Z ^ 2025-05-07T19:57:17.4021411Z 2025-05-07T19:57:17.4022847Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/gather_scatter/gather_scatter.cu(202): warning #177-D: function "fbgemm_gpu::::TorchDTypeTrait::dtype" was declared but never referenced 2025-05-07T19:57:17.4024756Z static auto dtype() { 2025-05-07T19:57:17.4025183Z ^ 2025-05-07T19:57:17.4025451Z 2025-05-07T19:57:17.4025898Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:57:17.4026541Z 2025-05-07T19:57:17.4027931Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/gather_scatter/gather_scatter.cu(195): warning #177-D: function "fbgemm_gpu::::TorchDTypeTrait::dtype" was declared but never referenced 2025-05-07T19:57:17.4029720Z static auto dtype() { 2025-05-07T19:57:17.4030182Z ^ 2025-05-07T19:57:17.4030418Z 2025-05-07T19:57:17.4031943Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/gather_scatter/gather_scatter.cu(188): warning #177-D: function "fbgemm_gpu::::TorchDTypeTrait::dtype" was declared but never referenced 2025-05-07T19:57:17.4034053Z static auto dtype() { 2025-05-07T19:57:17.4034522Z ^ 2025-05-07T19:57:17.4034763Z 2025-05-07T19:57:17.4036174Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/gather_scatter/gather_scatter.cu(202): warning #177-D: function "fbgemm_gpu::::TorchDTypeTrait::dtype" was declared but never referenced 2025-05-07T19:57:17.4038004Z static auto dtype() { 2025-05-07T19:57:17.4038468Z ^ 2025-05-07T19:57:17.4038742Z 2025-05-07T19:57:17.4039197Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:57:17.4039853Z 2025-05-07T19:57:17.4041286Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/gather_scatter/gather_scatter.cu(195): warning #177-D: function "fbgemm_gpu::::TorchDTypeTrait::dtype" was declared but never referenced 2025-05-07T19:57:17.4043076Z static auto dtype() { 2025-05-07T19:57:17.4043551Z ^ 2025-05-07T19:57:17.4043796Z 2025-05-07T19:57:17.4045430Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/gather_scatter/gather_scatter.cu(188): warning #177-D: function "fbgemm_gpu::::TorchDTypeTrait::dtype" was declared but never referenced 2025-05-07T19:57:17.4047289Z static auto dtype() { 2025-05-07T19:57:17.4047781Z ^ 2025-05-07T19:57:17.4048026Z 2025-05-07T19:57:17.4049448Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/gather_scatter/gather_scatter.cu(202): warning #177-D: function "fbgemm_gpu::::TorchDTypeTrait::dtype" was declared but never referenced 2025-05-07T19:57:17.4051245Z static auto dtype() { 2025-05-07T19:57:17.4051692Z ^ 2025-05-07T19:57:17.4051962Z 2025-05-07T19:57:17.4052404Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:57:17.4053054Z 2025-05-07T19:57:17.4054495Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/gather_scatter/gather_scatter.cu(195): warning #177-D: function "fbgemm_gpu::::TorchDTypeTrait::dtype" was declared but never referenced 2025-05-07T19:57:17.4056263Z static auto dtype() { 2025-05-07T19:57:17.4056776Z ^ 2025-05-07T19:57:17.4057018Z 2025-05-07T19:57:17.4058542Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/gather_scatter/gather_scatter.cu(188): warning #177-D: function "fbgemm_gpu::::TorchDTypeTrait::dtype" was declared but never referenced 2025-05-07T19:57:17.4060437Z static auto dtype() { 2025-05-07T19:57:17.4060913Z ^ 2025-05-07T19:57:17.4061153Z 2025-05-07T19:57:36.7910044Z [89/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_blockwise.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_blockwise.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_blockwise.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_blockwise.cu.o 2025-05-07T19:57:36.7932122Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:57:36.7934751Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:57:36.7936788Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:57:36.7937573Z ^ 2025-05-07T19:57:36.7937926Z 2025-05-07T19:57:36.7938346Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:57:36.7938981Z 2025-05-07T19:57:36.7940358Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:57:36.7942321Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T19:57:36.7943065Z ^ 2025-05-07T19:57:36.7943507Z 2025-05-07T19:57:40.3772318Z [90/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16.cu.o 2025-05-07T19:57:40.3794324Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:57:40.3796937Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:57:40.3798749Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:57:40.3799539Z ^ 2025-05-07T19:57:40.3799919Z 2025-05-07T19:57:40.3800364Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:57:40.3801029Z 2025-05-07T19:57:40.3802563Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:57:40.3804837Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T19:57:40.3805502Z ^ 2025-05-07T19:57:40.3805794Z 2025-05-07T19:57:44.4379172Z [91/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/moe/index_shuffling.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/moe/index_shuffling.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/moe/index_shuffling.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/moe/index_shuffling.cu.o 2025-05-07T19:57:44.4399635Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:58:09.0666744Z [92/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/comm/car.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/comm/car.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/comm/car.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/comm/car.cu.o 2025-05-07T19:58:09.0688378Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:58:12.8217441Z [93/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/quantize.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/quantize.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/quantize.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/quantize.cu.o 2025-05-07T19:58:12.8239268Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:58:12.8241821Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/quantize.cu(147): warning #177-D: variable "fbgemm_gpu::CVT_FP4_ELTS_PER_THREAD" was declared but never referenced 2025-05-07T19:58:12.8243369Z constexpr int CVT_FP4_ELTS_PER_THREAD = 8; 2025-05-07T19:58:12.8243925Z ^ 2025-05-07T19:58:12.8244203Z 2025-05-07T19:58:12.8244627Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:58:12.8245225Z 2025-05-07T19:58:12.8246329Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/quantize.cu(148): warning #177-D: variable "fbgemm_gpu::CVT_FP4_SF_VEC_SIZE" was declared but never referenced 2025-05-07T19:58:12.8248053Z constexpr int CVT_FP4_SF_VEC_SIZE = 16; 2025-05-07T19:58:12.8248588Z ^ 2025-05-07T19:58:12.8248827Z 2025-05-07T19:58:12.8249935Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/quantize.cu(147): warning #177-D: variable "fbgemm_gpu::CVT_FP4_ELTS_PER_THREAD" was declared but never referenced 2025-05-07T19:58:12.8251565Z constexpr int CVT_FP4_ELTS_PER_THREAD = 8; 2025-05-07T19:58:12.8252047Z ^ 2025-05-07T19:58:12.8252270Z 2025-05-07T19:58:12.8252671Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:58:12.8253248Z 2025-05-07T19:58:12.8254361Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/quantize.cu(148): warning #177-D: variable "fbgemm_gpu::CVT_FP4_SF_VEC_SIZE" was declared but never referenced 2025-05-07T19:58:12.8255814Z constexpr int CVT_FP4_SF_VEC_SIZE = 16; 2025-05-07T19:58:12.8258139Z ^ 2025-05-07T19:58:12.8258369Z 2025-05-07T19:58:12.8259538Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/quantize.cu(147): warning #177-D: variable "fbgemm_gpu::CVT_FP4_ELTS_PER_THREAD" was declared but never referenced 2025-05-07T19:58:12.8261110Z constexpr int CVT_FP4_ELTS_PER_THREAD = 8; 2025-05-07T19:58:12.8261640Z ^ 2025-05-07T19:58:12.8261885Z 2025-05-07T19:58:12.8262314Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:58:12.8262946Z 2025-05-07T19:58:12.8264170Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/quantize.cu(148): warning #177-D: variable "fbgemm_gpu::CVT_FP4_SF_VEC_SIZE" was declared but never referenced 2025-05-07T19:58:12.8265682Z constexpr int CVT_FP4_SF_VEC_SIZE = 16; 2025-05-07T19:58:12.8266242Z ^ 2025-05-07T19:58:12.8266478Z 2025-05-07T19:58:12.8267645Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/quantize.cu(147): warning #177-D: variable "fbgemm_gpu::CVT_FP4_ELTS_PER_THREAD" was declared but never referenced 2025-05-07T19:58:12.8269296Z constexpr int CVT_FP4_ELTS_PER_THREAD = 8; 2025-05-07T19:58:12.8270104Z ^ 2025-05-07T19:58:12.8270342Z 2025-05-07T19:58:12.8270772Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:58:12.8271429Z 2025-05-07T19:58:12.8272542Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/quantize.cu(148): warning #177-D: variable "fbgemm_gpu::CVT_FP4_SF_VEC_SIZE" was declared but never referenced 2025-05-07T19:58:12.8274025Z constexpr int CVT_FP4_SF_VEC_SIZE = 16; 2025-05-07T19:58:12.8274546Z ^ 2025-05-07T19:58:12.8274770Z 2025-05-07T19:58:12.8275782Z ptxas warning : Value of threads per SM for entry _ZN10fbgemm_gpu15cvt_fp16_to_fp4I13__nv_bfloat16Lb0EEEviiPKT_PKfPjS7_ is out of range. .minnctapersm will be ignored 2025-05-07T19:58:12.8278005Z ptxas warning : Value of threads per SM for entry _ZN10fbgemm_gpu15cvt_fp16_to_fp4I13__nv_bfloat16Lb1EEEviiPKT_PKfPjS7_ is out of range. .minnctapersm will be ignored 2025-05-07T19:58:12.8280229Z ptxas warning : Value of threads per SM for entry _ZN10fbgemm_gpu15cvt_fp16_to_fp4I6__halfLb0EEEviiPKT_PKfPjS7_ is out of range. .minnctapersm will be ignored 2025-05-07T19:58:12.8282343Z ptxas warning : Value of threads per SM for entry _ZN10fbgemm_gpu15cvt_fp16_to_fp4I6__halfLb1EEEviiPKT_PKfPjS7_ is out of range. .minnctapersm will be ignored 2025-05-07T19:58:17.0763017Z [94/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/kv_cache/kv_cache.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/kv_cache/kv_cache.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/kv_cache/kv_cache.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/kv_cache/kv_cache.cu.o 2025-05-07T19:58:17.0784855Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:58:17.1316236Z [95/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_128_128_128_2_1_1_t_f.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_128_128_128_2_1_1_t_f.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_128_128_128_2_1_1_t_f.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_128_128_128_2_1_1_t_f.cu.o 2025-05-07T19:58:17.1339134Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:58:17.1342337Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:58:17.1344484Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:58:17.1345274Z ^ 2025-05-07T19:58:17.1345536Z 2025-05-07T19:58:17.1345963Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:58:17.1346598Z 2025-05-07T19:58:17.1348038Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:58:17.1350093Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T19:58:17.1350895Z ^ 2025-05-07T19:58:17.1351217Z 2025-05-07T19:58:23.2261713Z [96/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_16_128_1_1_1_f_f.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_16_128_1_1_1_f_f.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_16_128_1_1_1_f_f.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_16_128_1_1_1_f_f.cu.o 2025-05-07T19:58:23.2284135Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:58:23.2287172Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:58:23.2289117Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:58:23.2289873Z ^ 2025-05-07T19:58:23.2290157Z 2025-05-07T19:58:23.2290555Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:58:23.2291172Z 2025-05-07T19:58:23.2292554Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:58:23.2294489Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T19:58:23.2295220Z ^ 2025-05-07T19:58:23.2295520Z 2025-05-07T19:58:32.5684787Z [97/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_128_128_1_1_1_f_f.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_128_128_1_1_1_f_f.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_128_128_1_1_1_f_f.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_128_128_1_1_1_f_f.cu.o 2025-05-07T19:58:32.5707282Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:58:32.5710936Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:58:32.5712932Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:58:32.5713670Z ^ 2025-05-07T19:58:32.5713947Z 2025-05-07T19:58:32.5714381Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:58:32.5714989Z 2025-05-07T19:58:32.5716366Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:58:32.5718331Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T19:58:32.5719080Z ^ 2025-05-07T19:58:32.5719365Z 2025-05-07T19:58:42.3624934Z [98/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/dispatch_fp8_rowwise_batched_kernel_on_tile_size.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/dispatch_fp8_rowwise_batched_kernel_on_tile_size.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/dispatch_fp8_rowwise_batched_kernel_on_tile_size.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/dispatch_fp8_rowwise_batched_kernel_on_tile_size.cu.o 2025-05-07T19:58:42.3647206Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:58:42.3650505Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:58:42.3652490Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:58:42.3653224Z ^ 2025-05-07T19:58:42.3653539Z 2025-05-07T19:58:42.3653959Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:58:42.3654547Z 2025-05-07T19:58:42.3656139Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:58:42.3658097Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T19:58:42.3658796Z ^ 2025-05-07T19:58:42.3659042Z 2025-05-07T19:58:43.5544268Z [99/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/dispatch_fp8_rowwise_batched_kernel_on_cluster_size_and_transpose.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/dispatch_fp8_rowwise_batched_kernel_on_cluster_size_and_transpose.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/dispatch_fp8_rowwise_batched_kernel_on_cluster_size_and_transpose.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/dispatch_fp8_rowwise_batched_kernel_on_cluster_size_and_transpose.cu.o 2025-05-07T19:58:43.5848882Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:58:43.5851945Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:58:43.5853952Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:58:43.5854728Z ^ 2025-05-07T19:58:43.5855013Z 2025-05-07T19:58:43.5855451Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:58:43.5856011Z 2025-05-07T19:58:43.5857399Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:58:43.5859365Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T19:58:43.5860178Z ^ 2025-05-07T19:58:43.5860474Z 2025-05-07T19:58:45.1353073Z [100/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/f8f8bf16_rowwise_batched.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/f8f8bf16_rowwise_batched.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/f8f8bf16_rowwise_batched.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/f8f8bf16_rowwise_batched.cu.o 2025-05-07T19:58:45.1375370Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:58:45.1378547Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:58:45.1380457Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:58:45.1381206Z ^ 2025-05-07T19:58:45.1381469Z 2025-05-07T19:58:45.1381830Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:58:45.1382399Z 2025-05-07T19:58:45.1383887Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:58:45.1385777Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T19:58:45.1386465Z ^ 2025-05-07T19:58:45.1386767Z 2025-05-07T19:58:49.7984259Z [101/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/handle_transposition.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/handle_transposition.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/handle_transposition.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/handle_transposition.cu.o 2025-05-07T19:58:49.8006216Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:58:49.8009199Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:58:49.8011913Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:58:49.8012709Z ^ 2025-05-07T19:58:49.8013000Z 2025-05-07T19:58:49.8013427Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:58:49.8014056Z 2025-05-07T19:58:49.8015433Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:58:49.8017511Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T19:58:49.8018280Z ^ 2025-05-07T19:58:49.8018590Z 2025-05-07T19:58:50.6789119Z [102/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_example_py_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -mavx2 -mf16c -mfma -fopenmp -MD -MT experimental/example/CMakeFiles/fbgemm_gpu_experimental_example_py.dir/src/example_nccl.cpp.o -MF experimental/example/CMakeFiles/fbgemm_gpu_experimental_example_py.dir/src/example_nccl.cpp.o.d -o experimental/example/CMakeFiles/fbgemm_gpu_experimental_example_py.dir/src/example_nccl.cpp.o -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/example/src/example_nccl.cpp 2025-05-07T19:58:54.3120745Z [103/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_256_128_1_1_1_f_f.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_256_128_1_1_1_f_f.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_256_128_1_1_1_f_f.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_256_128_1_1_1_f_f.cu.o 2025-05-07T19:58:54.3141303Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:58:54.3143961Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:58:54.3145849Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:58:54.3146588Z ^ 2025-05-07T19:58:54.3146858Z 2025-05-07T19:58:54.3147246Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:58:54.3147837Z 2025-05-07T19:58:54.3149141Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:58:54.3151021Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T19:58:54.3151705Z ^ 2025-05-07T19:58:54.3152006Z 2025-05-07T19:59:00.4891801Z [104/156] /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_example_py_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -std=c++20 -fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -mavx2 -mf16c -mfma -fopenmp -MD -MT experimental/example/CMakeFiles/fbgemm_gpu_experimental_example_py.dir/src/example_ops.cpp.o -MF experimental/example/CMakeFiles/fbgemm_gpu_experimental_example_py.dir/src/example_ops.cpp.o.d -o experimental/example/CMakeFiles/fbgemm_gpu_experimental_example_py.dir/src/example_ops.cpp.o -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/example/src/example_ops.cpp 2025-05-07T19:59:04.7298698Z [105/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/fast_gemv/include/fast_gemv.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/fast_gemv/include/fast_gemv.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/fast_gemv/include/fast_gemv.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/fast_gemv/include/fast_gemv.cu.o 2025-05-07T19:59:04.7320640Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:59:07.5721356Z [106/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_128_256_128_2_1_1_f_t.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_128_256_128_2_1_1_f_t.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_128_256_128_2_1_1_f_t.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_128_256_128_2_1_1_f_t.cu.o 2025-05-07T19:59:07.5743086Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:59:07.5745958Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:59:07.5747965Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:59:07.5748787Z ^ 2025-05-07T19:59:07.5749102Z 2025-05-07T19:59:07.5749580Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:59:07.5750219Z 2025-05-07T19:59:07.5751615Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:59:07.5753724Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T19:59:07.5754512Z ^ 2025-05-07T19:59:07.5754814Z 2025-05-07T19:59:07.5765894Z ptxas info : (C7511) Potential Performance Loss: wgmma.mma_async instructions are serialized due to insufficient register resources for the wgmma pipeline in the function '_ZN7cutlass13device_kernelINS_4gemm6kernel13GemmUniversalIN4cute5tupleIJiiiEEENS1_10collective13CollectiveMmaINS1_37MainloopSm90TmaGmmaWarpSpecializedFP8ILi4ENS5_IJNS4_1CILi2EEENSA_ILi1EEESC_EEENS1_24KernelTmaWarpSpecializedEEENS5_IJNSA_ILi128EEENSA_ILi256EEESG_EEENS_12float_e4m3_tENS5_IJlSC_lEEESJ_SK_NS4_8TiledMMAINS4_8MMA_AtomIJNS4_4SM904GMMA31MMA_64x256x32_F32E4M3E4M3_SS_TNILNSO_7ScaleInE1ELSQ_1EEEEEENS4_6LayoutINS5_IJSC_SC_SC_EEENS5_IJNSA_ILi0EEESV_SV_EEEEENS5_IJNS4_10UnderscoreESY_SY_EEEEENS4_13SM90_TMA_LOADENS4_14ComposedLayoutINS4_7SwizzleILi3ELi4ELi3EEENS4_18smem_ptr_flag_bitsILi8EEENST_INS5_IJNSA_ILi8EEESG_EEENS5_IJSG_SC_EEEEEEEvNS4_8identityENS4_23SM90_TMA_LOAD_MULTICASTES1B_vS1C_EENS_8epilogue10collective18CollectiveEpilogueINS1F_22Sm90TmaWarpSpecializedILi4ELi2ELi16ELb0ELb1EEEJSI_NS5_IJSG_NSA_ILi32EEEEEEvNS5_IJSC_llEEENS_10bfloat16_tES1M_NS1F_6fusion15Sm90TreeVisitorINS1O_11Sm90ComputeINS_10multipliesES1N_fLNS_15FloatRoundStyleE2EvEEJNS1O_16Sm90ColBroadcastILi0ESI_ffNS5_IJSC_SV_SV_EEELi4ELb1EEENS1P_INS1Q_IS1R_ffLS1S_2EvEEJNS1O_16Sm90RowBroadcastILi0ESI_ffNS5_IJSV_SC_SV_EEELi4ELb1EEENS1O_12Sm90AccFetchEEEEEEES11_NS12_IS14_NS15_ILi16EEENST_INS5_IJNSA_ILi64EEES17_EEENS5_IJSC_S25_EEEEEEENS4_17SM75_U16x8_LDSM_TENS4_14SM90_TMA_STOREES29_NS4_17SM90_U16x8_STSM_TENS4_9Copy_AtomIJNS4_17SM90_U32x4_STSM_NENS_6half_tEEEEvEEEvvEEEEvNT_6ParamsE' 2025-05-07T19:59:07.5788372Z ptxas info : (C7511) Potential Performance Loss: wgmma.mma_async instructions are serialized due to insufficient register resources for the wgmma pipeline in the function '_ZN7cutlass13device_kernelINS_4gemm6kernel13GemmUniversalIN4cute5tupleIJiiiEEENS1_10collective13CollectiveMmaINS1_37MainloopSm90TmaGmmaWarpSpecializedFP8ILi4ENS5_IJNS4_1CILi2EEENSA_ILi1EEESC_EEENS1_24KernelTmaWarpSpecializedEEENS5_IJNSA_ILi128EEENSA_ILi256EEESG_EEENS_12float_e4m3_tENS5_IJlSC_lEEENS_12float_e5m2_tESK_NS4_8TiledMMAINS4_8MMA_AtomIJNS4_4SM904GMMA31MMA_64x256x32_F32E4M3E5M2_SS_TNILNSP_7ScaleInE1ELSR_1EEEEEENS4_6LayoutINS5_IJSC_SC_SC_EEENS5_IJNSA_ILi0EEESW_SW_EEEEENS5_IJNS4_10UnderscoreESZ_SZ_EEEEENS4_13SM90_TMA_LOADENS4_14ComposedLayoutINS4_7SwizzleILi3ELi4ELi3EEENS4_18smem_ptr_flag_bitsILi8EEENSU_INS5_IJNSA_ILi8EEESG_EEENS5_IJSG_SC_EEEEEEEvNS4_8identityENS4_23SM90_TMA_LOAD_MULTICASTES1C_vS1D_EENS_8epilogue10collective18CollectiveEpilogueINS1G_22Sm90TmaWarpSpecializedILi4ELi2ELi16ELb0ELb1EEEJSI_NS5_IJSG_NSA_ILi32EEEEEEvNS5_IJSC_llEEENS_10bfloat16_tES1N_NS1G_6fusion15Sm90TreeVisitorINS1P_11Sm90ComputeINS_10multipliesES1O_fLNS_15FloatRoundStyleE2EvEEJNS1P_16Sm90ColBroadcastILi0ESI_ffNS5_IJSC_SW_SW_EEELi4ELb1EEENS1Q_INS1R_IS1S_ffLS1T_2EvEEJNS1P_16Sm90RowBroadcastILi0ESI_ffNS5_IJSW_SC_SW_EEELi4ELb1EEENS1P_12Sm90AccFetchEEEEEEES12_NS13_IS15_NS16_ILi16EEENSU_INS5_IJNSA_ILi64EEES18_EEENS5_IJSC_S26_EEEEEEENS4_17SM75_U16x8_LDSM_TENS4_14SM90_TMA_STOREES2A_NS4_17SM90_U16x8_STSM_TENS4_9Copy_AtomIJNS4_17SM90_U32x4_STSM_NENS_6half_tEEEEvEEEvvEEEEvNT_6ParamsE' 2025-05-07T19:59:07.5811169Z ptxas info : (C7511) Potential Performance Loss: wgmma.mma_async instructions are serialized due to insufficient register resources for the wgmma pipeline in the function '_ZN7cutlass13device_kernelINS_4gemm6kernel13GemmUniversalIN4cute5tupleIJiiiEEENS1_10collective13CollectiveMmaINS1_37MainloopSm90TmaGmmaWarpSpecializedFP8ILi4ENS5_IJNS4_1CILi2EEENSA_ILi1EEESC_EEENS1_24KernelTmaWarpSpecializedEEENS5_IJNSA_ILi128EEENSA_ILi256EEESG_EEENS_12float_e4m3_tENS5_IJlSC_lEEESJ_SK_NS4_8TiledMMAINS4_8MMA_AtomIJNS4_4SM904GMMA31MMA_64x256x32_F32E4M3E4M3_SS_TNILNSO_7ScaleInE1ELSQ_1EEEEEENS4_6LayoutINS5_IJSC_SC_SC_EEENS5_IJNSA_ILi0EEESV_SV_EEEEENS5_IJNS4_10UnderscoreESY_SY_EEEEENS4_13SM90_TMA_LOADENS4_14ComposedLayoutINS4_7SwizzleILi3ELi4ELi3EEENS4_18smem_ptr_flag_bitsILi8EEENST_INS5_IJNSA_ILi8EEESG_EEENS5_IJSG_SC_EEEEEEEvNS4_8identityENS4_23SM90_TMA_LOAD_MULTICASTES1B_vS1C_EENS_8epilogue10collective18CollectiveEpilogueINS1F_22Sm90TmaWarpSpecializedILi4ELi2ELi16ELb0ELb1EEEJSI_NS5_IJSG_NSA_ILi32EEEEEEvNS5_IJSC_llEEENS_10bfloat16_tES1M_NS1F_6fusion15Sm90TreeVisitorINS1O_11Sm90ComputeINS_4plusES1N_fLNS_15FloatRoundStyleE2EvEEJNS1O_16Sm90ColBroadcastILi0ESI_ffNS5_IJSC_SV_SV_EEELi4ELb1EEENS1P_INS1Q_INS_10multipliesEffLS1S_2EvEEJS1W_NS1P_IS1Y_JNS1O_16Sm90RowBroadcastILi0ESI_ffNS5_IJSV_SC_SV_EEELi4ELb1EEENS1O_12Sm90AccFetchEEEEEEEEEES11_NS12_IS14_NS15_ILi16EEENST_INS5_IJNSA_ILi64EEES17_EEENS5_IJSC_S27_EEEEEEENS4_17SM75_U16x8_LDSM_TENS4_14SM90_TMA_STOREES2B_NS4_17SM90_U16x8_STSM_TENS4_9Copy_AtomIJNS4_17SM90_U32x4_STSM_NENS_6half_tEEEEvEEEvvEEEEvNT_6ParamsE' 2025-05-07T19:59:07.5835543Z ptxas info : (C7511) Potential Performance Loss: wgmma.mma_async instructions are serialized due to insufficient register resources for the wgmma pipeline in the function '_ZN7cutlass13device_kernelINS_4gemm6kernel13GemmUniversalIN4cute5tupleIJiiiEEENS1_10collective13CollectiveMmaINS1_37MainloopSm90TmaGmmaWarpSpecializedFP8ILi4ENS5_IJNS4_1CILi2EEENSA_ILi1EEESC_EEENS1_24KernelTmaWarpSpecializedEEENS5_IJNSA_ILi128EEENSA_ILi256EEESG_EEENS_12float_e4m3_tENS5_IJlSC_lEEENS_12float_e5m2_tESK_NS4_8TiledMMAINS4_8MMA_AtomIJNS4_4SM904GMMA31MMA_64x256x32_F32E4M3E5M2_SS_TNILNSP_7ScaleInE1ELSR_1EEEEEENS4_6LayoutINS5_IJSC_SC_SC_EEENS5_IJNSA_ILi0EEESW_SW_EEEEENS5_IJNS4_10UnderscoreESZ_SZ_EEEEENS4_13SM90_TMA_LOADENS4_14ComposedLayoutINS4_7SwizzleILi3ELi4ELi3EEENS4_18smem_ptr_flag_bitsILi8EEENSU_INS5_IJNSA_ILi8EEESG_EEENS5_IJSG_SC_EEEEEEEvNS4_8identityENS4_23SM90_TMA_LOAD_MULTICASTES1C_vS1D_EENS_8epilogue10collective18CollectiveEpilogueINS1G_22Sm90TmaWarpSpecializedILi4ELi2ELi16ELb0ELb1EEEJSI_NS5_IJSG_NSA_ILi32EEEEEEvNS5_IJSC_llEEENS_10bfloat16_tES1N_NS1G_6fusion15Sm90TreeVisitorINS1P_11Sm90ComputeINS_4plusES1O_fLNS_15FloatRoundStyleE2EvEEJNS1P_16Sm90ColBroadcastILi0ESI_ffNS5_IJSC_SW_SW_EEELi4ELb1EEENS1Q_INS1R_INS_10multipliesEffLS1T_2EvEEJS1X_NS1Q_IS1Z_JNS1P_16Sm90RowBroadcastILi0ESI_ffNS5_IJSW_SC_SW_EEELi4ELb1EEENS1P_12Sm90AccFetchEEEEEEEEEES12_NS13_IS15_NS16_ILi16EEENSU_INS5_IJNSA_ILi64EEES18_EEENS5_IJSC_S28_EEEEEEENS4_17SM75_U16x8_LDSM_TENS4_14SM90_TMA_STOREES2C_NS4_17SM90_U16x8_STSM_TENS4_9Copy_AtomIJNS4_17SM90_U32x4_STSM_NENS_6half_tEEEEvEEEvvEEEEvNT_6ParamsE' 2025-05-07T19:59:07.5858873Z ptxas info : (C7511) Potential Performance Loss: wgmma.mma_async instructions are serialized due to insufficient register resources for the wgmma pipeline in the function '_ZN7cutlass13device_kernelINS_4gemm6kernel13GemmUniversalIN4cute5tupleIJiiiEEENS1_10collective13CollectiveMmaINS1_37MainloopSm90TmaGmmaWarpSpecializedFP8ILi4ENS5_IJNS4_1CILi2EEENSA_ILi1EEESC_EEENS1_24KernelTmaWarpSpecializedEEENS5_IJNSA_ILi128EEENSA_ILi256EEESG_EEENS_12float_e4m3_tENS5_IJlSC_lEEESJ_SK_NS4_8TiledMMAINS4_8MMA_AtomIJNS4_4SM904GMMA31MMA_64x256x32_F32E4M3E4M3_SS_TNILNSO_7ScaleInE1ELSQ_1EEEEEENS4_6LayoutINS5_IJSC_SC_SC_EEENS5_IJNSA_ILi0EEESV_SV_EEEEENS5_IJNS4_10UnderscoreESY_SY_EEEEENS4_13SM90_TMA_LOADENS4_14ComposedLayoutINS4_7SwizzleILi3ELi4ELi3EEENS4_18smem_ptr_flag_bitsILi8EEENST_INS5_IJNSA_ILi8EEESG_EEENS5_IJSG_SC_EEEEEEEvNS4_8identityENS4_23SM90_TMA_LOAD_MULTICASTES1B_vS1C_EENS_8epilogue10collective18CollectiveEpilogueINS1F_22Sm90TmaWarpSpecializedILi4ELi2ELi16ELb0ELb1EEEJSI_NS5_IJSG_NSA_ILi32EEEEEEvNS5_IJSC_llEEENS_10bfloat16_tES1M_NS1F_6fusion15Sm90TreeVisitorINS1O_11Sm90ComputeINS_4plusES1N_S1N_LNS_15FloatRoundStyleE2EvEEJNS1O_16Sm90ColBroadcastILi0ESI_S1N_S1N_NS5_IJSC_SV_SV_EEELi8ELb1EEENS1P_INS1Q_INS_10multipliesES1N_fLS1S_2EvEEJNS1U_ILi0ESI_ffS1V_Li4ELb1EEENS1P_INS1Q_IS1X_ffLS1S_2EvEEJNS1O_16Sm90RowBroadcastILi0ESI_ffNS5_IJSV_SC_SV_EEELi4ELb1EEENS1O_12Sm90AccFetchEEEEEEEEEES11_NS12_IS14_NS15_ILi16EEENST_INS5_IJNSA_ILi64EEES17_EEENS5_IJSC_S29_EEEEEEENS4_17SM75_U16x8_LDSM_TENS4_14SM90_TMA_STOREES2D_NS4_17SM90_U16x8_STSM_TENS4_9Copy_AtomIJNS4_17SM90_U32x4_STSM_NENS_6half_tEEEEvEEEvvEEEEvNT_6ParamsE' 2025-05-07T19:59:07.5882615Z ptxas info : (C7511) Potential Performance Loss: wgmma.mma_async instructions are serialized due to insufficient register resources for the wgmma pipeline in the function '_ZN7cutlass13device_kernelINS_4gemm6kernel13GemmUniversalIN4cute5tupleIJiiiEEENS1_10collective13CollectiveMmaINS1_37MainloopSm90TmaGmmaWarpSpecializedFP8ILi4ENS5_IJNS4_1CILi2EEENSA_ILi1EEESC_EEENS1_24KernelTmaWarpSpecializedEEENS5_IJNSA_ILi128EEENSA_ILi256EEESG_EEENS_12float_e4m3_tENS5_IJlSC_lEEENS_12float_e5m2_tESK_NS4_8TiledMMAINS4_8MMA_AtomIJNS4_4SM904GMMA31MMA_64x256x32_F32E4M3E5M2_SS_TNILNSP_7ScaleInE1ELSR_1EEEEEENS4_6LayoutINS5_IJSC_SC_SC_EEENS5_IJNSA_ILi0EEESW_SW_EEEEENS5_IJNS4_10UnderscoreESZ_SZ_EEEEENS4_13SM90_TMA_LOADENS4_14ComposedLayoutINS4_7SwizzleILi3ELi4ELi3EEENS4_18smem_ptr_flag_bitsILi8EEENSU_INS5_IJNSA_ILi8EEESG_EEENS5_IJSG_SC_EEEEEEEvNS4_8identityENS4_23SM90_TMA_LOAD_MULTICASTES1C_vS1D_EENS_8epilogue10collective18CollectiveEpilogueINS1G_22Sm90TmaWarpSpecializedILi4ELi2ELi16ELb0ELb1EEEJSI_NS5_IJSG_NSA_ILi32EEEEEEvNS5_IJSC_llEEENS_10bfloat16_tES1N_NS1G_6fusion15Sm90TreeVisitorINS1P_11Sm90ComputeINS_4plusES1O_S1O_LNS_15FloatRoundStyleE2EvEEJNS1P_16Sm90ColBroadcastILi0ESI_S1O_S1O_NS5_IJSC_SW_SW_EEELi8ELb1EEENS1Q_INS1R_INS_10multipliesES1O_fLS1T_2EvEEJNS1V_ILi0ESI_ffS1W_Li4ELb1EEENS1Q_INS1R_IS1Y_ffLS1T_2EvEEJNS1P_16Sm90RowBroadcastILi0ESI_ffNS5_IJSW_SC_SW_EEELi4ELb1EEENS1P_12Sm90AccFetchEEEEEEEEEES12_NS13_IS15_NS16_ILi16EEENSU_INS5_IJNSA_ILi64EEES18_EEENS5_IJSC_S2A_EEEEEEENS4_17SM75_U16x8_LDSM_TENS4_14SM90_TMA_STOREES2E_NS4_17SM90_U16x8_STSM_TENS4_9Copy_AtomIJNS4_17SM90_U32x4_STSM_NENS_6half_tEEEEvEEEvvEEEEvNT_6ParamsE' 2025-05-07T19:59:07.7364853Z [107/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/bf16bf16bf16_grouped.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/bf16bf16bf16_grouped.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/bf16bf16bf16_grouped.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/bf16bf16bf16_grouped.cu.o 2025-05-07T19:59:07.7385340Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:59:07.7388088Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:59:07.7390552Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:59:07.7391258Z ^ 2025-05-07T19:59:07.7391547Z 2025-05-07T19:59:07.7391966Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:59:07.7392540Z 2025-05-07T19:59:07.7393900Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:59:07.7395877Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T19:59:07.7396626Z ^ 2025-05-07T19:59:07.7396910Z 2025-05-07T19:59:08.6453741Z [108/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_128_256_128_4_4_1_f_t.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_128_256_128_4_4_1_f_t.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_128_256_128_4_4_1_f_t.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_128_256_128_4_4_1_f_t.cu.o 2025-05-07T19:59:08.6474941Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:59:08.6477694Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:59:08.6480374Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:59:08.6481029Z ^ 2025-05-07T19:59:08.6481349Z 2025-05-07T19:59:08.6481777Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:59:08.6482400Z 2025-05-07T19:59:08.6483765Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:59:08.6485764Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T19:59:08.6486463Z ^ 2025-05-07T19:59:08.6486795Z 2025-05-07T19:59:08.6497460Z ptxas info : (C7511) Potential Performance Loss: wgmma.mma_async instructions are serialized due to insufficient register resources for the wgmma pipeline in the function '_ZN7cutlass13device_kernelINS_4gemm6kernel13GemmUniversalIN4cute5tupleIJiiiEEENS1_10collective13CollectiveMmaINS1_37MainloopSm90TmaGmmaWarpSpecializedFP8ILi4ENS5_IJNS4_1CILi4EEESB_NSA_ILi1EEEEEENS1_24KernelTmaWarpSpecializedEEENS5_IJNSA_ILi128EEENSA_ILi256EEESG_EEENS_12float_e4m3_tENS5_IJlSC_lEEESJ_SK_NS4_8TiledMMAINS4_8MMA_AtomIJNS4_4SM904GMMA31MMA_64x256x32_F32E4M3E4M3_SS_TNILNSO_7ScaleInE1ELSQ_1EEEEEENS4_6LayoutINS5_IJSC_SC_SC_EEENS5_IJNSA_ILi0EEESV_SV_EEEEENS5_IJNS4_10UnderscoreESY_SY_EEEEENS4_23SM90_TMA_LOAD_MULTICASTENS4_14ComposedLayoutINS4_7SwizzleILi3ELi4ELi3EEENS4_18smem_ptr_flag_bitsILi8EEENST_INS5_IJNSA_ILi8EEESG_EEENS5_IJSG_SC_EEEEEEEvNS4_8identityES11_S1B_vS1C_EENS_8epilogue10collective18CollectiveEpilogueINS1E_22Sm90TmaWarpSpecializedILi4ELi2ELi16ELb0ELb1EEEJSI_NS5_IJSG_NSA_ILi32EEEEEEvNS5_IJSC_llEEENS_10bfloat16_tES1L_NS1E_6fusion15Sm90TreeVisitorINS1N_11Sm90ComputeINS_10multipliesES1M_fLNS_15FloatRoundStyleE2EvEEJNS1N_16Sm90ColBroadcastILi0ESI_ffNS5_IJSC_SV_SV_EEELi4ELb1EEENS1O_INS1P_IS1Q_ffLS1R_2EvEEJNS1N_16Sm90RowBroadcastILi0ESI_ffNS5_IJSV_SC_SV_EEELi4ELb1EEENS1N_12Sm90AccFetchEEEEEEENS4_13SM90_TMA_LOADENS12_IS14_NS15_ILi16EEENST_INS5_IJNSA_ILi64EEES17_EEENS5_IJSC_S25_EEEEEEENS4_17SM75_U16x8_LDSM_TENS4_14SM90_TMA_STOREES29_NS4_17SM90_U16x8_STSM_TENS4_9Copy_AtomIJNS4_17SM90_U32x4_STSM_NENS_6half_tEEEEvEEEvvEEEEvNT_6ParamsE' 2025-05-07T19:59:08.6519881Z ptxas info : (C7511) Potential Performance Loss: wgmma.mma_async instructions are serialized due to insufficient register resources for the wgmma pipeline in the function '_ZN7cutlass13device_kernelINS_4gemm6kernel13GemmUniversalIN4cute5tupleIJiiiEEENS1_10collective13CollectiveMmaINS1_37MainloopSm90TmaGmmaWarpSpecializedFP8ILi4ENS5_IJNS4_1CILi4EEESB_NSA_ILi1EEEEEENS1_24KernelTmaWarpSpecializedEEENS5_IJNSA_ILi128EEENSA_ILi256EEESG_EEENS_12float_e4m3_tENS5_IJlSC_lEEENS_12float_e5m2_tESK_NS4_8TiledMMAINS4_8MMA_AtomIJNS4_4SM904GMMA31MMA_64x256x32_F32E4M3E5M2_SS_TNILNSP_7ScaleInE1ELSR_1EEEEEENS4_6LayoutINS5_IJSC_SC_SC_EEENS5_IJNSA_ILi0EEESW_SW_EEEEENS5_IJNS4_10UnderscoreESZ_SZ_EEEEENS4_23SM90_TMA_LOAD_MULTICASTENS4_14ComposedLayoutINS4_7SwizzleILi3ELi4ELi3EEENS4_18smem_ptr_flag_bitsILi8EEENSU_INS5_IJNSA_ILi8EEESG_EEENS5_IJSG_SC_EEEEEEEvNS4_8identityES12_S1C_vS1D_EENS_8epilogue10collective18CollectiveEpilogueINS1F_22Sm90TmaWarpSpecializedILi4ELi2ELi16ELb0ELb1EEEJSI_NS5_IJSG_NSA_ILi32EEEEEEvNS5_IJSC_llEEENS_10bfloat16_tES1M_NS1F_6fusion15Sm90TreeVisitorINS1O_11Sm90ComputeINS_10multipliesES1N_fLNS_15FloatRoundStyleE2EvEEJNS1O_16Sm90ColBroadcastILi0ESI_ffNS5_IJSC_SW_SW_EEELi4ELb1EEENS1P_INS1Q_IS1R_ffLS1S_2EvEEJNS1O_16Sm90RowBroadcastILi0ESI_ffNS5_IJSW_SC_SW_EEELi4ELb1EEENS1O_12Sm90AccFetchEEEEEEENS4_13SM90_TMA_LOADENS13_IS15_NS16_ILi16EEENSU_INS5_IJNSA_ILi64EEES18_EEENS5_IJSC_S26_EEEEEEENS4_17SM75_U16x8_LDSM_TENS4_14SM90_TMA_STOREES2A_NS4_17SM90_U16x8_STSM_TENS4_9Copy_AtomIJNS4_17SM90_U32x4_STSM_NENS_6half_tEEEEvEEEvvEEEEvNT_6ParamsE' 2025-05-07T19:59:08.6542315Z ptxas info : (C7511) Potential Performance Loss: wgmma.mma_async instructions are serialized due to insufficient register resources for the wgmma pipeline in the function '_ZN7cutlass13device_kernelINS_4gemm6kernel13GemmUniversalIN4cute5tupleIJiiiEEENS1_10collective13CollectiveMmaINS1_37MainloopSm90TmaGmmaWarpSpecializedFP8ILi4ENS5_IJNS4_1CILi4EEESB_NSA_ILi1EEEEEENS1_24KernelTmaWarpSpecializedEEENS5_IJNSA_ILi128EEENSA_ILi256EEESG_EEENS_12float_e4m3_tENS5_IJlSC_lEEESJ_SK_NS4_8TiledMMAINS4_8MMA_AtomIJNS4_4SM904GMMA31MMA_64x256x32_F32E4M3E4M3_SS_TNILNSO_7ScaleInE1ELSQ_1EEEEEENS4_6LayoutINS5_IJSC_SC_SC_EEENS5_IJNSA_ILi0EEESV_SV_EEEEENS5_IJNS4_10UnderscoreESY_SY_EEEEENS4_23SM90_TMA_LOAD_MULTICASTENS4_14ComposedLayoutINS4_7SwizzleILi3ELi4ELi3EEENS4_18smem_ptr_flag_bitsILi8EEENST_INS5_IJNSA_ILi8EEESG_EEENS5_IJSG_SC_EEEEEEEvNS4_8identityES11_S1B_vS1C_EENS_8epilogue10collective18CollectiveEpilogueINS1E_22Sm90TmaWarpSpecializedILi4ELi2ELi16ELb0ELb1EEEJSI_NS5_IJSG_NSA_ILi32EEEEEEvNS5_IJSC_llEEENS_10bfloat16_tES1L_NS1E_6fusion15Sm90TreeVisitorINS1N_11Sm90ComputeINS_4plusES1M_fLNS_15FloatRoundStyleE2EvEEJNS1N_16Sm90ColBroadcastILi0ESI_ffNS5_IJSC_SV_SV_EEELi4ELb1EEENS1O_INS1P_INS_10multipliesEffLS1R_2EvEEJS1V_NS1O_IS1X_JNS1N_16Sm90RowBroadcastILi0ESI_ffNS5_IJSV_SC_SV_EEELi4ELb1EEENS1N_12Sm90AccFetchEEEEEEEEEENS4_13SM90_TMA_LOADENS12_IS14_NS15_ILi16EEENST_INS5_IJNSA_ILi64EEES17_EEENS5_IJSC_S27_EEEEEEENS4_17SM75_U16x8_LDSM_TENS4_14SM90_TMA_STOREES2B_NS4_17SM90_U16x8_STSM_TENS4_9Copy_AtomIJNS4_17SM90_U32x4_STSM_NENS_6half_tEEEEvEEEvvEEEEvNT_6ParamsE' 2025-05-07T19:59:08.6565194Z ptxas info : (C7511) Potential Performance Loss: wgmma.mma_async instructions are serialized due to insufficient register resources for the wgmma pipeline in the function '_ZN7cutlass13device_kernelINS_4gemm6kernel13GemmUniversalIN4cute5tupleIJiiiEEENS1_10collective13CollectiveMmaINS1_37MainloopSm90TmaGmmaWarpSpecializedFP8ILi4ENS5_IJNS4_1CILi4EEESB_NSA_ILi1EEEEEENS1_24KernelTmaWarpSpecializedEEENS5_IJNSA_ILi128EEENSA_ILi256EEESG_EEENS_12float_e4m3_tENS5_IJlSC_lEEENS_12float_e5m2_tESK_NS4_8TiledMMAINS4_8MMA_AtomIJNS4_4SM904GMMA31MMA_64x256x32_F32E4M3E5M2_SS_TNILNSP_7ScaleInE1ELSR_1EEEEEENS4_6LayoutINS5_IJSC_SC_SC_EEENS5_IJNSA_ILi0EEESW_SW_EEEEENS5_IJNS4_10UnderscoreESZ_SZ_EEEEENS4_23SM90_TMA_LOAD_MULTICASTENS4_14ComposedLayoutINS4_7SwizzleILi3ELi4ELi3EEENS4_18smem_ptr_flag_bitsILi8EEENSU_INS5_IJNSA_ILi8EEESG_EEENS5_IJSG_SC_EEEEEEEvNS4_8identityES12_S1C_vS1D_EENS_8epilogue10collective18CollectiveEpilogueINS1F_22Sm90TmaWarpSpecializedILi4ELi2ELi16ELb0ELb1EEEJSI_NS5_IJSG_NSA_ILi32EEEEEEvNS5_IJSC_llEEENS_10bfloat16_tES1M_NS1F_6fusion15Sm90TreeVisitorINS1O_11Sm90ComputeINS_4plusES1N_fLNS_15FloatRoundStyleE2EvEEJNS1O_16Sm90ColBroadcastILi0ESI_ffNS5_IJSC_SW_SW_EEELi4ELb1EEENS1P_INS1Q_INS_10multipliesEffLS1S_2EvEEJS1W_NS1P_IS1Y_JNS1O_16Sm90RowBroadcastILi0ESI_ffNS5_IJSW_SC_SW_EEELi4ELb1EEENS1O_12Sm90AccFetchEEEEEEEEEENS4_13SM90_TMA_LOADENS13_IS15_NS16_ILi16EEENSU_INS5_IJNSA_ILi64EEES18_EEENS5_IJSC_S28_EEEEEEENS4_17SM75_U16x8_LDSM_TENS4_14SM90_TMA_STOREES2C_NS4_17SM90_U16x8_STSM_TENS4_9Copy_AtomIJNS4_17SM90_U32x4_STSM_NENS_6half_tEEEEvEEEvvEEEEvNT_6ParamsE' 2025-05-07T19:59:08.6588504Z ptxas info : (C7511) Potential Performance Loss: wgmma.mma_async instructions are serialized due to insufficient register resources for the wgmma pipeline in the function '_ZN7cutlass13device_kernelINS_4gemm6kernel13GemmUniversalIN4cute5tupleIJiiiEEENS1_10collective13CollectiveMmaINS1_37MainloopSm90TmaGmmaWarpSpecializedFP8ILi4ENS5_IJNS4_1CILi4EEESB_NSA_ILi1EEEEEENS1_24KernelTmaWarpSpecializedEEENS5_IJNSA_ILi128EEENSA_ILi256EEESG_EEENS_12float_e4m3_tENS5_IJlSC_lEEESJ_SK_NS4_8TiledMMAINS4_8MMA_AtomIJNS4_4SM904GMMA31MMA_64x256x32_F32E4M3E4M3_SS_TNILNSO_7ScaleInE1ELSQ_1EEEEEENS4_6LayoutINS5_IJSC_SC_SC_EEENS5_IJNSA_ILi0EEESV_SV_EEEEENS5_IJNS4_10UnderscoreESY_SY_EEEEENS4_23SM90_TMA_LOAD_MULTICASTENS4_14ComposedLayoutINS4_7SwizzleILi3ELi4ELi3EEENS4_18smem_ptr_flag_bitsILi8EEENST_INS5_IJNSA_ILi8EEESG_EEENS5_IJSG_SC_EEEEEEEvNS4_8identityES11_S1B_vS1C_EENS_8epilogue10collective18CollectiveEpilogueINS1E_22Sm90TmaWarpSpecializedILi4ELi2ELi16ELb0ELb1EEEJSI_NS5_IJSG_NSA_ILi32EEEEEEvNS5_IJSC_llEEENS_10bfloat16_tES1L_NS1E_6fusion15Sm90TreeVisitorINS1N_11Sm90ComputeINS_4plusES1M_S1M_LNS_15FloatRoundStyleE2EvEEJNS1N_16Sm90ColBroadcastILi0ESI_S1M_S1M_NS5_IJSC_SV_SV_EEELi8ELb1EEENS1O_INS1P_INS_10multipliesES1M_fLS1R_2EvEEJNS1T_ILi0ESI_ffS1U_Li4ELb1EEENS1O_INS1P_IS1W_ffLS1R_2EvEEJNS1N_16Sm90RowBroadcastILi0ESI_ffNS5_IJSV_SC_SV_EEELi4ELb1EEENS1N_12Sm90AccFetchEEEEEEEEEENS4_13SM90_TMA_LOADENS12_IS14_NS15_ILi16EEENST_INS5_IJNSA_ILi64EEES17_EEENS5_IJSC_S29_EEEEEEENS4_17SM75_U16x8_LDSM_TENS4_14SM90_TMA_STOREES2D_NS4_17SM90_U16x8_STSM_TENS4_9Copy_AtomIJNS4_17SM90_U32x4_STSM_NENS_6half_tEEEEvEEEvvEEEEvNT_6ParamsE' 2025-05-07T19:59:08.6614309Z ptxas info : (C7511) Potential Performance Loss: wgmma.mma_async instructions are serialized due to insufficient register resources for the wgmma pipeline in the function '_ZN7cutlass13device_kernelINS_4gemm6kernel13GemmUniversalIN4cute5tupleIJiiiEEENS1_10collective13CollectiveMmaINS1_37MainloopSm90TmaGmmaWarpSpecializedFP8ILi4ENS5_IJNS4_1CILi4EEESB_NSA_ILi1EEEEEENS1_24KernelTmaWarpSpecializedEEENS5_IJNSA_ILi128EEENSA_ILi256EEESG_EEENS_12float_e4m3_tENS5_IJlSC_lEEENS_12float_e5m2_tESK_NS4_8TiledMMAINS4_8MMA_AtomIJNS4_4SM904GMMA31MMA_64x256x32_F32E4M3E5M2_SS_TNILNSP_7ScaleInE1ELSR_1EEEEEENS4_6LayoutINS5_IJSC_SC_SC_EEENS5_IJNSA_ILi0EEESW_SW_EEEEENS5_IJNS4_10UnderscoreESZ_SZ_EEEEENS4_23SM90_TMA_LOAD_MULTICASTENS4_14ComposedLayoutINS4_7SwizzleILi3ELi4ELi3EEENS4_18smem_ptr_flag_bitsILi8EEENSU_INS5_IJNSA_ILi8EEESG_EEENS5_IJSG_SC_EEEEEEEvNS4_8identityES12_S1C_vS1D_EENS_8epilogue10collective18CollectiveEpilogueINS1F_22Sm90TmaWarpSpecializedILi4ELi2ELi16ELb0ELb1EEEJSI_NS5_IJSG_NSA_ILi32EEEEEEvNS5_IJSC_llEEENS_10bfloat16_tES1M_NS1F_6fusion15Sm90TreeVisitorINS1O_11Sm90ComputeINS_4plusES1N_S1N_LNS_15FloatRoundStyleE2EvEEJNS1O_16Sm90ColBroadcastILi0ESI_S1N_S1N_NS5_IJSC_SW_SW_EEELi8ELb1EEENS1P_INS1Q_INS_10multipliesES1N_fLS1S_2EvEEJNS1U_ILi0ESI_ffS1V_Li4ELb1EEENS1P_INS1Q_IS1X_ffLS1S_2EvEEJNS1O_16Sm90RowBroadcastILi0ESI_ffNS5_IJSW_SC_SW_EEELi4ELb1EEENS1O_12Sm90AccFetchEEEEEEEEEENS4_13SM90_TMA_LOADENS13_IS15_NS16_ILi16EEENSU_INS5_IJNSA_ILi64EEES18_EEENS5_IJSC_S2A_EEEEEEENS4_17SM75_U16x8_LDSM_TENS4_14SM90_TMA_STOREES2E_NS4_17SM90_U16x8_STSM_TENS4_9Copy_AtomIJNS4_17SM90_U32x4_STSM_NENS_6half_tEEEEvEEEvvEEEEvNT_6ParamsE' 2025-05-07T19:59:11.2484953Z [109/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/bf16i4bf16_rowwise_batched.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/bf16i4bf16_rowwise_batched.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/bf16i4bf16_rowwise_batched.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/bf16i4bf16_rowwise_batched.cu.o 2025-05-07T19:59:11.2497435Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:59:11.2499146Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:59:11.2500342Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:59:11.2500823Z ^ 2025-05-07T19:59:11.2500995Z 2025-05-07T19:59:11.2501235Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:59:11.2501591Z 2025-05-07T19:59:11.2502426Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:59:11.2503710Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T19:59:11.2504178Z ^ 2025-05-07T19:59:11.2504357Z 2025-05-07T19:59:46.1854586Z [110/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_64_128_2_1_1_f_f.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_64_128_2_1_1_f_f.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_64_128_2_1_1_f_f.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_64_128_2_1_1_f_f.cu.o 2025-05-07T19:59:46.1875945Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:59:46.1878972Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:59:46.1880865Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:59:46.1881589Z ^ 2025-05-07T19:59:46.1881870Z 2025-05-07T19:59:46.1882253Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:59:46.1882831Z 2025-05-07T19:59:46.1884202Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:59:46.1886200Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T19:59:46.1886989Z ^ 2025-05-07T19:59:46.1887275Z 2025-05-07T19:59:47.6890913Z [111/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_32_128_2_1_1_f_f.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_32_128_2_1_1_f_f.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_32_128_2_1_1_f_f.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_32_128_2_1_1_f_f.cu.o 2025-05-07T19:59:47.6911391Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:59:47.6913931Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:59:47.6916111Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T19:59:47.6916880Z ^ 2025-05-07T19:59:47.6917162Z 2025-05-07T19:59:47.6917525Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T19:59:47.6918054Z 2025-05-07T19:59:47.6919340Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T19:59:47.6921150Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T19:59:47.6921875Z ^ 2025-05-07T19:59:47.6922183Z 2025-05-07T19:59:48.1668479Z [112/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/fast_gemv/bf16_fast_gemv.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/fast_gemv/bf16_fast_gemv.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/fast_gemv/bf16_fast_gemv.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/fast_gemv/bf16_fast_gemv.cu.o 2025-05-07T19:59:48.1690597Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T19:59:57.8100616Z [113/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/fast_gemv/bf16fp8bf16_fast_gemv.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/fast_gemv/bf16fp8bf16_fast_gemv.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/fast_gemv/bf16fp8bf16_fast_gemv.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/fast_gemv/bf16fp8bf16_fast_gemv.cu.o 2025-05-07T19:59:57.8121719Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:00:02.7630680Z [114/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_256_128_2_1_1_f_f.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_256_128_2_1_1_f_f.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_256_128_2_1_1_f_f.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_256_128_2_1_1_f_f.cu.o 2025-05-07T20:00:02.7654068Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:00:02.7656892Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:02.7658916Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:02.7659726Z ^ 2025-05-07T20:00:02.7660025Z 2025-05-07T20:00:02.7660467Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:02.7661148Z 2025-05-07T20:00:02.7662670Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:02.7664980Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:00:02.7665757Z ^ 2025-05-07T20:00:02.7666189Z 2025-05-07T20:00:11.4323785Z [115/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/i8i8bf16_dynamic.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/i8i8bf16_dynamic.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/i8i8bf16_dynamic.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/i8i8bf16_dynamic.cu.o 2025-05-07T20:00:11.4343485Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:00:11.4346183Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:11.4348246Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:11.4349029Z ^ 2025-05-07T20:00:11.4349319Z 2025-05-07T20:00:11.4349754Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:11.4350344Z 2025-05-07T20:00:11.4351535Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:11.4353627Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:00:11.4354426Z ^ 2025-05-07T20:00:11.4354742Z 2025-05-07T20:00:12.1789478Z [116/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/i8i8bf16.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/i8i8bf16.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/i8i8bf16.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/i8i8bf16.cu.o 2025-05-07T20:00:12.1809999Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:00:12.1811862Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:12.1813064Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:12.1813561Z ^ 2025-05-07T20:00:12.1813747Z 2025-05-07T20:00:12.1814001Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:12.1814398Z 2025-05-07T20:00:12.1815211Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:12.1816356Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:00:12.1816832Z ^ 2025-05-07T20:00:12.1817011Z 2025-05-07T20:00:14.0501884Z [117/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_f.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_f.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_f.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_f.cu.o 2025-05-07T20:00:14.0524121Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:00:14.0526962Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:14.0529025Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:14.0529804Z ^ 2025-05-07T20:00:14.0530141Z 2025-05-07T20:00:14.0530566Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:14.0531180Z 2025-05-07T20:00:14.0532629Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:14.0534615Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:00:14.0535462Z ^ 2025-05-07T20:00:14.0535786Z 2025-05-07T20:00:14.0537273Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:14.0539428Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:14.0540310Z ^ 2025-05-07T20:00:14.0540767Z detected during: 2025-05-07T20:00:14.0566589Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:14.0615308Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:14.0665078Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:14.0693218Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=128, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_f.cu 2025-05-07T20:00:14.0695424Z 2025-05-07T20:00:14.0696112Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:14.0696777Z 2025-05-07T20:00:14.0698154Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:14.0700128Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:14.0700835Z ^ 2025-05-07T20:00:14.0701273Z detected during: 2025-05-07T20:00:14.0724996Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=10, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple, cute::C<1>, cute::C<1>>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:00:14.0775182Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:14.0823887Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:14.0872164Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:14.0899701Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=128, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_f.cu 2025-05-07T20:00:14.0901661Z 2025-05-07T20:00:14.0903049Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:14.0905197Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:14.0906004Z ^ 2025-05-07T20:00:14.0906794Z detected during: 2025-05-07T20:00:14.0931906Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:14.0976939Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:14.1022262Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:14.1049879Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=128, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_f.cu 2025-05-07T20:00:14.1051637Z 2025-05-07T20:00:14.1052013Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:14.1052607Z 2025-05-07T20:00:14.1054075Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:14.1055903Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:14.1056632Z ^ 2025-05-07T20:00:14.1057034Z detected during: 2025-05-07T20:00:14.1080065Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=10, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple, cute::C<1>, cute::C<1>>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:00:14.1129225Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:14.1175249Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:14.1223478Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:14.1250770Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=128, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_f.cu 2025-05-07T20:00:14.1253247Z 2025-05-07T20:00:14.1254798Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:14.1256892Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:14.1257765Z ^ 2025-05-07T20:00:14.1258259Z detected during: 2025-05-07T20:00:14.1284496Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:14.1331301Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:14.1380098Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:14.1405014Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=128, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_f.cu 2025-05-07T20:00:14.1407084Z 2025-05-07T20:00:14.1407551Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:14.1408189Z 2025-05-07T20:00:14.1409788Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:14.1411798Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:14.1412450Z ^ 2025-05-07T20:00:14.1412857Z detected during: 2025-05-07T20:00:14.1436582Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=10, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple, cute::C<1>, cute::C<1>>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:00:14.1484070Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:14.1531545Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:14.1579560Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:14.1607047Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=128, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_f.cu 2025-05-07T20:00:14.1609340Z 2025-05-07T20:00:14.1611932Z ptxas /tmp/tmpxft_00007e61_00000000-9_f4f4bf16_128_128_4_1_1_f.compute_90.ptx, line 925; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:00:14.1616157Z ptxas /tmp/tmpxft_00007e61_00000000-9_f4f4bf16_128_128_4_1_1_f.compute_90.ptx, line 937; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:00:14.1620601Z ptxas /tmp/tmpxft_00007e61_00000000-9_f4f4bf16_128_128_4_1_1_f.compute_90.ptx, line 1076; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:00:14.1625114Z ptxas /tmp/tmpxft_00007e61_00000000-9_f4f4bf16_128_128_4_1_1_f.compute_90.ptx, line 1088; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:00:14.1628669Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:14.1630563Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:14.1631315Z ^ 2025-05-07T20:00:14.1631730Z detected during: 2025-05-07T20:00:14.1656196Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:14.1704077Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:14.1753923Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:14.1781819Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=128, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_f.cu 2025-05-07T20:00:14.1784108Z 2025-05-07T20:00:14.1784599Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:14.1785198Z 2025-05-07T20:00:14.1786682Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:14.1788531Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:14.1789195Z ^ 2025-05-07T20:00:14.1789565Z detected during: 2025-05-07T20:00:14.1814111Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=10, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple, cute::C<1>, cute::C<1>>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:00:14.1862033Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:14.1909233Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:14.1958404Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:14.1986126Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=128, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_f.cu 2025-05-07T20:00:14.1988174Z 2025-05-07T20:00:14.1989468Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:14.1991379Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:14.1992127Z ^ 2025-05-07T20:00:14.1992570Z detected during: 2025-05-07T20:00:14.2016721Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:14.2044397Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:14.2072426Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:14.2088257Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=128, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_f.cu 2025-05-07T20:00:14.2089392Z 2025-05-07T20:00:14.2089635Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:14.2089997Z 2025-05-07T20:00:14.2090783Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:14.2091905Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:14.2092312Z ^ 2025-05-07T20:00:14.2092535Z detected during: 2025-05-07T20:00:14.2106036Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=10, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple, cute::C<1>, cute::C<1>>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:00:14.2133791Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:14.2161039Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:14.2188695Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:14.2205025Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=128, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_f.cu 2025-05-07T20:00:14.2206204Z 2025-05-07T20:00:14.2206997Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:14.2208103Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:14.2208545Z ^ 2025-05-07T20:00:14.2208965Z detected during: 2025-05-07T20:00:14.2223411Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:14.2250506Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:14.2278081Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:14.2293711Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=128, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_f.cu 2025-05-07T20:00:14.2294819Z 2025-05-07T20:00:14.2295054Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:14.2295410Z 2025-05-07T20:00:14.2296188Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:14.2297281Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:14.2297675Z ^ 2025-05-07T20:00:14.2297906Z detected during: 2025-05-07T20:00:14.2311612Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=10, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple, cute::C<1>, cute::C<1>>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:00:14.2339004Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:14.2366117Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:14.2393731Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:14.2409622Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=128, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_f.cu 2025-05-07T20:00:14.2410744Z 2025-05-07T20:00:15.1085586Z [118/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_t.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_t.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_t.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_t.cu.o 2025-05-07T20:00:15.1110041Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:00:15.1113072Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:15.1116749Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:15.1117624Z ^ 2025-05-07T20:00:15.1117959Z 2025-05-07T20:00:15.1118469Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:15.1119162Z 2025-05-07T20:00:15.1120782Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:15.1123090Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:00:15.1123947Z ^ 2025-05-07T20:00:15.1124307Z 2025-05-07T20:00:15.1125835Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:15.1128227Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:15.1129082Z ^ 2025-05-07T20:00:15.1129595Z detected during: 2025-05-07T20:00:15.1158400Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:15.1213657Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:15.1269593Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:15.1301337Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=128, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_t.cu 2025-05-07T20:00:15.1303708Z 2025-05-07T20:00:15.1304185Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:15.1304874Z 2025-05-07T20:00:15.1306404Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:15.1308636Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:15.1309546Z ^ 2025-05-07T20:00:15.1310017Z detected during: 2025-05-07T20:00:15.1337459Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=10, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple, cute::C<1>, cute::C<1>>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:00:15.1393401Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:15.1449412Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:15.1504717Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:15.1534466Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=128, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_t.cu 2025-05-07T20:00:15.1536487Z 2025-05-07T20:00:15.1537931Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:15.1539975Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:15.1540757Z ^ 2025-05-07T20:00:15.1541255Z detected during: 2025-05-07T20:00:15.1568328Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:15.1622963Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:15.1679751Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:15.1712438Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=128, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_t.cu 2025-05-07T20:00:15.1714869Z 2025-05-07T20:00:15.1715308Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:15.1716226Z 2025-05-07T20:00:15.1717794Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:15.1719965Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:15.1720725Z ^ 2025-05-07T20:00:15.1721130Z detected during: 2025-05-07T20:00:15.1749704Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=10, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple, cute::C<1>, cute::C<1>>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:00:15.1805549Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:15.1862433Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:15.1919822Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:15.1952017Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=128, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_t.cu 2025-05-07T20:00:15.1954296Z 2025-05-07T20:00:15.1955872Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:15.1958155Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:15.1959003Z ^ 2025-05-07T20:00:15.1959508Z detected during: 2025-05-07T20:00:15.1989317Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:15.2045731Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:15.2102985Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:15.2135273Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=128, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_t.cu 2025-05-07T20:00:15.2137517Z 2025-05-07T20:00:15.2137952Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:15.2138652Z 2025-05-07T20:00:15.2140230Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:15.2142601Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:15.2143457Z ^ 2025-05-07T20:00:15.2143883Z detected during: 2025-05-07T20:00:15.2170909Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=10, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple, cute::C<1>, cute::C<1>>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:00:15.2227057Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:15.2283796Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:15.2335450Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:15.2367047Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=128, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_t.cu 2025-05-07T20:00:15.2369224Z 2025-05-07T20:00:15.2371710Z ptxas /tmp/tmpxft_00007e66_00000000-9_f4f4bf16_128_128_4_1_1_t.compute_90.ptx, line 925; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:00:15.2376572Z ptxas /tmp/tmpxft_00007e66_00000000-9_f4f4bf16_128_128_4_1_1_t.compute_90.ptx, line 937; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:00:15.2381516Z ptxas /tmp/tmpxft_00007e66_00000000-9_f4f4bf16_128_128_4_1_1_t.compute_90.ptx, line 1076; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:00:15.2387543Z ptxas /tmp/tmpxft_00007e66_00000000-9_f4f4bf16_128_128_4_1_1_t.compute_90.ptx, line 1088; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:00:15.2391876Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:15.2394073Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:15.2394921Z ^ 2025-05-07T20:00:15.2395419Z detected during: 2025-05-07T20:00:15.2424472Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:15.2480446Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:15.2537038Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:15.2569506Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=128, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_t.cu 2025-05-07T20:00:15.2571815Z 2025-05-07T20:00:15.2572304Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:15.2573018Z 2025-05-07T20:00:15.2574585Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:15.2576759Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:15.2577581Z ^ 2025-05-07T20:00:15.2578004Z detected during: 2025-05-07T20:00:15.2605750Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=10, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple, cute::C<1>, cute::C<1>>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:00:15.2635749Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:15.2663128Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:15.2690828Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:15.2706650Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=128, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_t.cu 2025-05-07T20:00:15.2707775Z 2025-05-07T20:00:15.2708594Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:15.2709934Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:15.2710399Z ^ 2025-05-07T20:00:15.2710669Z detected during: 2025-05-07T20:00:15.2725037Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:15.2752345Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:15.2779952Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:15.2795713Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=128, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_t.cu 2025-05-07T20:00:15.2796818Z 2025-05-07T20:00:15.2797063Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:15.2797441Z 2025-05-07T20:00:15.2798230Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:15.2799428Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:15.2799831Z ^ 2025-05-07T20:00:15.2800087Z detected during: 2025-05-07T20:00:15.2813793Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=10, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple, cute::C<1>, cute::C<1>>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:00:15.2841364Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:15.2868591Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:15.2896271Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:15.2912152Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=128, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_t.cu 2025-05-07T20:00:15.2913282Z 2025-05-07T20:00:15.2914067Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:15.2915211Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:15.2915655Z ^ 2025-05-07T20:00:15.2915948Z detected during: 2025-05-07T20:00:15.2930300Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:15.2957523Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:15.2985234Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:15.3000933Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=128, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_t.cu 2025-05-07T20:00:15.3002057Z 2025-05-07T20:00:15.3002300Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:15.3002658Z 2025-05-07T20:00:15.3003465Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:15.3004561Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:15.3004994Z ^ 2025-05-07T20:00:15.3005230Z detected during: 2025-05-07T20:00:15.3018809Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=10, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple, cute::C<1>, cute::C<1>>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:00:15.3046459Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:15.3073674Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:15.3101308Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:15.3117170Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=128, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_t.cu 2025-05-07T20:00:15.3118302Z 2025-05-07T20:00:20.2985861Z [119/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/fast_gemv/fp8fp8bf16_fast_gemv.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/fast_gemv/fp8fp8bf16_fast_gemv.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/fast_gemv/fp8fp8bf16_fast_gemv.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/fast_gemv/fp8fp8bf16_fast_gemv.cu.o 2025-05-07T20:00:29.5367877Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:00:29.5378939Z [120/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_example_py_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/example/CMakeFiles/fbgemm_gpu_experimental_example_py.dir/src/cutlass_sgemm_nn.cu.o -MF experimental/example/CMakeFiles/fbgemm_gpu_experimental_example_py.dir/src/cutlass_sgemm_nn.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/example/src/cutlass_sgemm_nn.cu -o experimental/example/CMakeFiles/fbgemm_gpu_experimental_example_py.dir/src/cutlass_sgemm_nn.cu.o 2025-05-07T20:00:29.5390683Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:00:29.7019388Z [121/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_tensorwise.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_tensorwise.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_tensorwise.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_tensorwise.cu.o 2025-05-07T20:00:29.7031428Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:00:29.7032960Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:29.7034086Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:29.7034513Z ^ 2025-05-07T20:00:29.7034682Z 2025-05-07T20:00:29.7034932Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:29.7035275Z 2025-05-07T20:00:29.7036068Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:29.7037425Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:00:29.7037869Z ^ 2025-05-07T20:00:29.7038037Z 2025-05-07T20:00:30.8824964Z [122/156] : && /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -fPIC -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -Wl,-O2 -Wl,--sort-common -Wl,--as-needed -Wl,-z,relro -Wl,-z,now -Wl,--disable-new-dtags -Wl,--gc-sections -Wl,--allow-shlib-undefined -Wl,-rpath,/github/home/miniconda/envs/build_binary/lib -Wl,-rpath-link,/github/home/miniconda/envs/build_binary/lib -L/github/home/miniconda/envs/build_binary/lib -L/github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib -L/github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/stubs -s -shared -Wl,-soname,fbgemm_gpu_experimental_example_py.so -o experimental/example/fbgemm_gpu_experimental_example_py.so experimental/example/CMakeFiles/fbgemm_gpu_experimental_example_py.dir/src/example_nccl.cpp.o experimental/example/CMakeFiles/fbgemm_gpu_experimental_example_py.dir/src/example_ops.cpp.o experimental/example/CMakeFiles/fbgemm_gpu_experimental_example_py.dir/src/cutlass_sgemm_nn.cu.o -L/lib/intel64 -L/lib/intel64_win -L/lib/win-x64 -Wl,-rpath,/lib/intel64:/lib/intel64_win:/lib/win-x64:/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/lib:/github/home/miniconda/envs/build_binary/lib/stubs: /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/lib/libtorch.so /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/lib/libc10.so /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/libnvrtc.so /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/lib/libc10_cuda.so /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/lib/libnccl.so.2 /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/stubs/libcuda.so /github/home/miniconda/envs/build_binary/lib/stubs/libnvidia-ml.so -Wl,--no-as-needed,"/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/lib/libtorch_cpu.so" -Wl,--as-needed -Wl,--no-as-needed,"/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/lib/libtorch_cuda.so" -Wl,--as-needed /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/lib/libc10_cuda.so /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/lib/libc10.so /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/libcudart.so -Wl,--no-as-needed,"/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/lib/libtorch.so" -Wl,--as-needed -lcudadevrt -lcudart_static -lrt -lpthread -ldl && : 2025-05-07T20:00:31.0354313Z [123/156] cd /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.12/cmake-build/experimental/example && bash /__w/FBGEMM/FBGEMM/fbgemm_gpu/../.github/scripts/fbgemm_gpu_postbuild.bash /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.12/cmake-build/experimental/example/fbgemm_gpu_experimental_example_py.so 2025-05-07T20:00:31.0355845Z ################################################################################ 2025-05-07T20:00:31.0356215Z [CMAKE] Running post-build script ... 2025-05-07T20:00:31.0356961Z Target file: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.12/cmake-build/experimental/example/fbgemm_gpu_experimental_example_py.so 2025-05-07T20:00:31.0357662Z Removing all RPATHs ... 2025-05-07T20:00:31.0357943Z ################################################################################ 2025-05-07T20:00:42.6386436Z [124/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/mixed_dtype_utils.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/mixed_dtype_utils.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/mixed_dtype_utils.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/mixed_dtype_utils.cu.o 2025-05-07T20:00:42.6398575Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:00:47.7621856Z [125/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_t.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_t.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_t.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_t.cu.o 2025-05-07T20:00:47.7644423Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:00:47.7647226Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:47.7649277Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:47.7650094Z ^ 2025-05-07T20:00:47.7650409Z 2025-05-07T20:00:47.7650842Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:47.7651498Z 2025-05-07T20:00:47.7652953Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:47.7654947Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:00:47.7655692Z ^ 2025-05-07T20:00:47.7656025Z 2025-05-07T20:00:47.7657470Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:47.7659539Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:47.7660293Z ^ 2025-05-07T20:00:47.7660781Z detected during: 2025-05-07T20:00:47.7687480Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:47.7737308Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:47.7788795Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:47.7818271Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_t.cu 2025-05-07T20:00:47.7820297Z 2025-05-07T20:00:47.7820752Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:47.7821404Z 2025-05-07T20:00:47.7823110Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:47.7825198Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:47.7825931Z ^ 2025-05-07T20:00:47.7826336Z detected during: 2025-05-07T20:00:47.7851243Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=13, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:00:47.7901773Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:47.7952309Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:47.8005305Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:47.8034833Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_t.cu 2025-05-07T20:00:47.8036828Z 2025-05-07T20:00:47.8038288Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:47.8040414Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:47.8041222Z ^ 2025-05-07T20:00:47.8041716Z detected during: 2025-05-07T20:00:47.8068487Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:47.8119420Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:47.8169909Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:47.8198742Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_t.cu 2025-05-07T20:00:47.8200730Z 2025-05-07T20:00:47.8201208Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:47.8201829Z 2025-05-07T20:00:47.8203297Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:47.8205279Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:47.8205974Z ^ 2025-05-07T20:00:47.8220426Z detected during: 2025-05-07T20:00:47.8245348Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=13, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:00:47.8296297Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:47.8347294Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:47.8398375Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:47.8426782Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_t.cu 2025-05-07T20:00:47.8428967Z 2025-05-07T20:00:47.8430371Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:47.8432354Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:47.8433097Z ^ 2025-05-07T20:00:47.8433524Z detected during: 2025-05-07T20:00:47.8459442Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:47.8509687Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:47.8562227Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:47.8590163Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_t.cu 2025-05-07T20:00:47.8592224Z 2025-05-07T20:00:47.8592655Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:47.8593318Z 2025-05-07T20:00:47.8594755Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:47.8596717Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:47.8597394Z ^ 2025-05-07T20:00:47.8597776Z detected during: 2025-05-07T20:00:47.8622728Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=13, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:00:47.8671630Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:47.8720482Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:47.8769565Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:47.8797657Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_t.cu 2025-05-07T20:00:47.8799644Z 2025-05-07T20:00:47.8801022Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:47.8803058Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:47.8803763Z ^ 2025-05-07T20:00:47.8804170Z detected during: 2025-05-07T20:00:47.8821316Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:47.8849010Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:47.8876854Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:47.8892602Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_t.cu 2025-05-07T20:00:47.8893722Z 2025-05-07T20:00:47.8893957Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:47.8894305Z 2025-05-07T20:00:47.8895090Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:47.8896248Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:47.8896657Z ^ 2025-05-07T20:00:47.8896876Z detected during: 2025-05-07T20:00:47.8910641Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=13, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:00:47.8938223Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:47.8965657Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:47.8994946Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:47.9010887Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_t.cu 2025-05-07T20:00:47.9011992Z 2025-05-07T20:00:47.9012785Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:47.9013892Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:47.9014335Z ^ 2025-05-07T20:00:47.9014590Z detected during: 2025-05-07T20:00:47.9029098Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:47.9056537Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:47.9084510Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:47.9100375Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_t.cu 2025-05-07T20:00:47.9101489Z 2025-05-07T20:00:47.9101765Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:47.9102125Z 2025-05-07T20:00:47.9102913Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:47.9104246Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:47.9104663Z ^ 2025-05-07T20:00:47.9104947Z detected during: 2025-05-07T20:00:47.9118631Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=13, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:00:47.9146512Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:47.9173936Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:47.9201817Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:47.9217790Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_t.cu 2025-05-07T20:00:47.9218894Z 2025-05-07T20:00:47.9219669Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:47.9220786Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:47.9221225Z ^ 2025-05-07T20:00:47.9221476Z detected during: 2025-05-07T20:00:47.9236008Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:47.9263426Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:47.9292750Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:47.9308868Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_t.cu 2025-05-07T20:00:47.9310010Z 2025-05-07T20:00:47.9310255Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:47.9310644Z 2025-05-07T20:00:47.9311437Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:47.9312542Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:47.9312975Z ^ 2025-05-07T20:00:47.9313241Z detected during: 2025-05-07T20:00:47.9326778Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=13, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:00:47.9354585Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:47.9480123Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:47.9508220Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:47.9524183Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_t.cu 2025-05-07T20:00:47.9525356Z 2025-05-07T20:00:48.7832673Z [126/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_f.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_f.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_f.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_f.cu.o 2025-05-07T20:00:48.7855678Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:00:48.7858454Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:48.7860630Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:48.7861441Z ^ 2025-05-07T20:00:48.7861752Z 2025-05-07T20:00:48.7862220Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:48.7862877Z 2025-05-07T20:00:48.7864474Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:48.7866832Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:00:48.7867670Z ^ 2025-05-07T20:00:48.7867965Z 2025-05-07T20:00:48.7869467Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:48.7871615Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:48.7872422Z ^ 2025-05-07T20:00:48.7872920Z detected during: 2025-05-07T20:00:48.7900025Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:48.7953241Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:48.8006736Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:48.8036439Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_f.cu 2025-05-07T20:00:48.8038417Z 2025-05-07T20:00:48.8038854Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:48.8039508Z 2025-05-07T20:00:48.8040977Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:48.8043289Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:48.8044027Z ^ 2025-05-07T20:00:48.8044407Z detected during: 2025-05-07T20:00:48.8069388Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=13, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple, cute::C<4>, cute::C<1>>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:00:48.8120689Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:48.8175693Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:48.8228069Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:48.8256539Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_f.cu 2025-05-07T20:00:48.8258583Z 2025-05-07T20:00:48.8260089Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:48.8262359Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:48.8263165Z ^ 2025-05-07T20:00:48.8263685Z detected during: 2025-05-07T20:00:48.8290286Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:48.8342145Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:48.8395289Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:48.8424415Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_f.cu 2025-05-07T20:00:48.8426392Z 2025-05-07T20:00:48.8426838Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:48.8427455Z 2025-05-07T20:00:48.8428841Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:48.8430921Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:48.8431567Z ^ 2025-05-07T20:00:48.8431961Z detected during: 2025-05-07T20:00:48.8456655Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=13, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple, cute::C<4>, cute::C<1>>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:00:48.8509804Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:48.8560798Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:48.8612634Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:48.8642755Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_f.cu 2025-05-07T20:00:48.8644870Z 2025-05-07T20:00:48.8646215Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:48.8648390Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:48.8649158Z ^ 2025-05-07T20:00:48.8649619Z detected during: 2025-05-07T20:00:48.8676658Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:48.8728718Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:48.8770134Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:48.8792610Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_f.cu 2025-05-07T20:00:48.8794217Z 2025-05-07T20:00:48.8794553Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:48.8795111Z 2025-05-07T20:00:48.8796212Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:48.8797789Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:48.8798525Z ^ 2025-05-07T20:00:48.8798896Z detected during: 2025-05-07T20:00:48.8819362Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=13, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple, cute::C<4>, cute::C<1>>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:00:48.8866635Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:48.8917875Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:48.8969576Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:48.8999280Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_f.cu 2025-05-07T20:00:48.9001499Z 2025-05-07T20:00:48.9002963Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:48.9005056Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:48.9005808Z ^ 2025-05-07T20:00:48.9006294Z detected during: 2025-05-07T20:00:48.9033785Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:48.9082567Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:48.9133795Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:48.9162609Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_f.cu 2025-05-07T20:00:48.9164724Z 2025-05-07T20:00:48.9165173Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:48.9165813Z 2025-05-07T20:00:48.9167245Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:48.9169333Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:48.9170099Z ^ 2025-05-07T20:00:48.9170505Z detected during: 2025-05-07T20:00:48.9195629Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=13, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple, cute::C<4>, cute::C<1>>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:00:48.9231213Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:48.9258748Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:48.9288231Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:48.9304183Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_f.cu 2025-05-07T20:00:48.9305377Z 2025-05-07T20:00:48.9306186Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:48.9307311Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:48.9307777Z ^ 2025-05-07T20:00:48.9308070Z detected during: 2025-05-07T20:00:48.9322810Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:48.9350328Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:48.9378429Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:48.9394309Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_f.cu 2025-05-07T20:00:48.9395434Z 2025-05-07T20:00:48.9395711Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:48.9396067Z 2025-05-07T20:00:48.9396852Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:48.9398017Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:48.9398445Z ^ 2025-05-07T20:00:48.9398679Z detected during: 2025-05-07T20:00:48.9412496Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=13, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple, cute::C<4>, cute::C<1>>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:00:48.9440315Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:48.9467750Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:48.9495662Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:48.9511822Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_f.cu 2025-05-07T20:00:48.9512948Z 2025-05-07T20:00:48.9513741Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:48.9514900Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:48.9515386Z ^ 2025-05-07T20:00:48.9515666Z detected during: 2025-05-07T20:00:48.9530131Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:48.9557650Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:48.9586835Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:48.9602834Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_f.cu 2025-05-07T20:00:48.9603952Z 2025-05-07T20:00:48.9604204Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:48.9604607Z 2025-05-07T20:00:48.9605397Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:48.9606570Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:48.9606983Z ^ 2025-05-07T20:00:48.9607259Z detected during: 2025-05-07T20:00:48.9621109Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=13, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple, cute::C<4>, cute::C<1>>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:00:48.9649014Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:00:48.9676538Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:00:48.9704508Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:00:48.9720480Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_f.cu 2025-05-07T20:00:48.9721641Z 2025-05-07T20:00:57.9401089Z [127/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/bf16i4bf16_shuffled_grouped.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/bf16i4bf16_shuffled_grouped.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/bf16i4bf16_shuffled_grouped.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/bf16i4bf16_shuffled_grouped.cu.o 2025-05-07T20:00:57.9414035Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:00:57.9415593Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:57.9416728Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:00:57.9417168Z ^ 2025-05-07T20:00:57.9417357Z 2025-05-07T20:00:57.9417596Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:00:57.9417947Z 2025-05-07T20:00:57.9418766Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:00:57.9419898Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:00:57.9420343Z ^ 2025-05-07T20:00:57.9420514Z 2025-05-07T20:01:02.7063778Z [128/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_t.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_t.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_t.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_t.cu.o 2025-05-07T20:01:02.7086195Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:01:02.7089113Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:02.7091173Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:02.7091981Z ^ 2025-05-07T20:01:02.7092294Z 2025-05-07T20:01:02.7092790Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:02.7093458Z 2025-05-07T20:01:02.7094836Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:02.7097019Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:01:02.7097875Z ^ 2025-05-07T20:01:02.7098174Z 2025-05-07T20:01:02.7099601Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:02.7101750Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:02.7102496Z ^ 2025-05-07T20:01:02.7102949Z detected during: 2025-05-07T20:01:02.7129846Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:02.7178928Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:02.7227769Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:02.7256536Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_t.cu 2025-05-07T20:01:02.7258647Z 2025-05-07T20:01:02.7259091Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:02.7259691Z 2025-05-07T20:01:02.7261084Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:02.7263060Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:02.7263887Z ^ 2025-05-07T20:01:02.7264328Z detected during: 2025-05-07T20:01:02.7288304Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=13, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple, cute::C<4>, cute::C<1>>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:02.7338093Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:02.7386595Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:02.7435779Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:02.7464159Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_t.cu 2025-05-07T20:01:02.7466449Z 2025-05-07T20:01:02.7467806Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:02.7469795Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:02.7470574Z ^ 2025-05-07T20:01:02.7471002Z detected during: 2025-05-07T20:01:02.7496476Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:02.7546068Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:02.7595113Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:02.7624410Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_t.cu 2025-05-07T20:01:02.7626538Z 2025-05-07T20:01:02.7627001Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:02.7627638Z 2025-05-07T20:01:02.7629008Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:02.7630948Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:02.7631683Z ^ 2025-05-07T20:01:02.7632118Z detected during: 2025-05-07T20:01:02.7656699Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=13, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple, cute::C<4>, cute::C<1>>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:02.7705850Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:02.7735128Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:02.7763179Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:02.7779032Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_t.cu 2025-05-07T20:01:02.7780205Z 2025-05-07T20:01:02.7781003Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:02.7782153Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:02.7782600Z ^ 2025-05-07T20:01:02.7782894Z detected during: 2025-05-07T20:01:02.7797352Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:02.7825161Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:02.7853108Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:02.7868980Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_t.cu 2025-05-07T20:01:02.7870116Z 2025-05-07T20:01:02.7870362Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:02.7870721Z 2025-05-07T20:01:02.7871530Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:02.7872625Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:02.7873100Z ^ 2025-05-07T20:01:02.7873344Z detected during: 2025-05-07T20:01:02.7886852Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=13, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple, cute::C<4>, cute::C<1>>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:02.7914874Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:02.7942346Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:02.7970296Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:02.7986156Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_t.cu 2025-05-07T20:01:02.7987270Z 2025-05-07T20:01:02.7988154Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:02.7989285Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:02.7989744Z ^ 2025-05-07T20:01:02.7990034Z detected during: 2025-05-07T20:01:02.8004436Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:02.8032157Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:02.8060081Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:02.8075937Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_t.cu 2025-05-07T20:01:02.8077051Z 2025-05-07T20:01:02.8077316Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:02.8077674Z 2025-05-07T20:01:02.8078458Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:02.8079574Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:02.8080062Z ^ 2025-05-07T20:01:02.8080302Z detected during: 2025-05-07T20:01:02.8093815Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=13, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple, cute::C<4>, cute::C<1>>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:02.8121787Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:02.8149679Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:02.8177604Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:02.8193569Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_t.cu 2025-05-07T20:01:02.8194685Z 2025-05-07T20:01:02.8195476Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:02.8196631Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:02.8197104Z ^ 2025-05-07T20:01:02.8197381Z detected during: 2025-05-07T20:01:02.8212077Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:02.8239642Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:02.8267547Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:02.8283370Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_t.cu 2025-05-07T20:01:02.8284480Z 2025-05-07T20:01:02.8284724Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:02.8285102Z 2025-05-07T20:01:02.8285893Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:02.8287012Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:02.8287423Z ^ 2025-05-07T20:01:02.8287678Z detected during: 2025-05-07T20:01:02.8301162Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=13, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple, cute::C<4>, cute::C<1>>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:02.8329145Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:02.8356611Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:02.8384537Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:02.8400310Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_t.cu 2025-05-07T20:01:02.8401484Z 2025-05-07T20:01:02.8402273Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:02.8403416Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:02.8403859Z ^ 2025-05-07T20:01:02.8404146Z detected during: 2025-05-07T20:01:02.8418713Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:02.8446335Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:02.8474277Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:02.8490074Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_t.cu 2025-05-07T20:01:02.8491205Z 2025-05-07T20:01:02.8491448Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:02.8491801Z 2025-05-07T20:01:02.8492607Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:02.8493699Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:02.8494131Z ^ 2025-05-07T20:01:02.8494364Z detected during: 2025-05-07T20:01:02.8507971Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=13, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple, cute::C<4>, cute::C<1>>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:02.8536271Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:02.8563769Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:02.8591870Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:02.8607654Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_t.cu 2025-05-07T20:01:02.8609008Z 2025-05-07T20:01:16.0845088Z [129/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise_grouped.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise_grouped.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise_grouped.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise_grouped.cu.o 2025-05-07T20:01:16.0857296Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:01:16.0858910Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:16.0860029Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:16.0860470Z ^ 2025-05-07T20:01:16.0860635Z 2025-05-07T20:01:16.0860870Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:16.0861226Z 2025-05-07T20:01:16.0862022Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:16.0863149Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:01:16.0863682Z ^ 2025-05-07T20:01:16.0863853Z 2025-05-07T20:01:37.1077080Z [130/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/f8f8bf16_rowwise_batched_impl.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/f8f8bf16_rowwise_batched_impl.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/f8f8bf16_rowwise_batched_impl.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/f8f8bf16_rowwise_batched_impl.cu.o 2025-05-07T20:01:37.1099871Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:01:37.1102956Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:37.1105164Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:37.1105905Z ^ 2025-05-07T20:01:37.1106219Z 2025-05-07T20:01:37.1106656Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:37.1107297Z 2025-05-07T20:01:37.1108941Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:37.1110906Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:01:37.1111627Z ^ 2025-05-07T20:01:37.1111909Z 2025-05-07T20:01:40.6929060Z [131/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_t.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_t.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_t.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_t.cu.o 2025-05-07T20:01:40.6949486Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:01:40.6952054Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:40.6954291Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:40.6955080Z ^ 2025-05-07T20:01:40.6955398Z 2025-05-07T20:01:40.6955833Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:40.6956423Z 2025-05-07T20:01:40.6957931Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:40.6959961Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:01:40.6960707Z ^ 2025-05-07T20:01:40.6961012Z 2025-05-07T20:01:40.6962463Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:40.6964690Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:40.6965495Z ^ 2025-05-07T20:01:40.6965932Z detected during: 2025-05-07T20:01:40.6991957Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:40.7040394Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:40.7088640Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:40.7116670Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_t.cu 2025-05-07T20:01:40.7118609Z 2025-05-07T20:01:40.7119054Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:40.7119698Z 2025-05-07T20:01:40.7121054Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:40.7123054Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:40.7123794Z ^ 2025-05-07T20:01:40.7124226Z detected during: 2025-05-07T20:01:40.7148025Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, cute::C<1>, cute::C<1>>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:40.7198439Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:40.7247455Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:40.7296489Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:40.7323799Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_t.cu 2025-05-07T20:01:40.7325809Z 2025-05-07T20:01:40.7327125Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:40.7329184Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:40.7329991Z ^ 2025-05-07T20:01:40.7330456Z detected during: 2025-05-07T20:01:40.7356592Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:40.7425339Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:40.7474770Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:40.7503003Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_t.cu 2025-05-07T20:01:40.7505210Z 2025-05-07T20:01:40.7505672Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:40.7506276Z 2025-05-07T20:01:40.7507780Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:40.7509785Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:40.7510485Z ^ 2025-05-07T20:01:40.7511139Z detected during: 2025-05-07T20:01:40.7535288Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, cute::C<1>, cute::C<1>>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:40.7583263Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:40.7631364Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:40.7679479Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:40.7707315Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_t.cu 2025-05-07T20:01:40.7709550Z 2025-05-07T20:01:40.7711042Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:40.7713076Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:40.7713862Z ^ 2025-05-07T20:01:40.7714749Z detected during: 2025-05-07T20:01:40.7740019Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:40.7786350Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:40.7834455Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:40.7861512Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_t.cu 2025-05-07T20:01:40.7863641Z 2025-05-07T20:01:40.7864077Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:40.7864867Z 2025-05-07T20:01:40.7866271Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:40.7868247Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:40.7868987Z ^ 2025-05-07T20:01:40.7869366Z detected during: 2025-05-07T20:01:40.7893160Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, cute::C<1>, cute::C<1>>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:40.7942822Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:40.7987267Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:40.8030707Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:40.8055503Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_t.cu 2025-05-07T20:01:40.8057307Z 2025-05-07T20:01:40.8058611Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:40.8060432Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:40.8061211Z ^ 2025-05-07T20:01:40.8061692Z detected during: 2025-05-07T20:01:40.8084150Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:40.8126842Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:40.8169991Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:40.8194633Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_t.cu 2025-05-07T20:01:40.8196291Z 2025-05-07T20:01:40.8196650Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:40.8197194Z 2025-05-07T20:01:40.8198366Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:40.8200049Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:40.8200638Z ^ 2025-05-07T20:01:40.8201000Z detected during: 2025-05-07T20:01:40.8222295Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, cute::C<1>, cute::C<1>>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:40.8269741Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:40.8317239Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:40.8366402Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:40.8394548Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_t.cu 2025-05-07T20:01:40.8396537Z 2025-05-07T20:01:40.8397929Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:40.8399933Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:40.8400742Z ^ 2025-05-07T20:01:40.8401221Z detected during: 2025-05-07T20:01:40.8419621Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:40.8446724Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:40.8474054Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:40.8489580Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_t.cu 2025-05-07T20:01:40.8490697Z 2025-05-07T20:01:40.8490933Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:40.8491285Z 2025-05-07T20:01:40.8492125Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:40.8493208Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:40.8493614Z ^ 2025-05-07T20:01:40.8493850Z detected during: 2025-05-07T20:01:40.8507409Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, cute::C<1>, cute::C<1>>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:40.8535096Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:40.8562118Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:40.8589524Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:40.8605032Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_t.cu 2025-05-07T20:01:40.8606157Z 2025-05-07T20:01:40.8606940Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:40.8608064Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:40.8608497Z ^ 2025-05-07T20:01:40.8608960Z detected during: 2025-05-07T20:01:40.8623108Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:40.8650024Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:40.8677357Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:40.8692883Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_t.cu 2025-05-07T20:01:40.8694047Z 2025-05-07T20:01:40.8694296Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:40.8694679Z 2025-05-07T20:01:40.8695470Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:40.8696561Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:40.8696991Z ^ 2025-05-07T20:01:40.8697241Z detected during: 2025-05-07T20:01:40.8711021Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, cute::C<1>, cute::C<1>>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:40.8739272Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:40.8766240Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:40.8793571Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:40.8809249Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_t.cu 2025-05-07T20:01:40.8810383Z 2025-05-07T20:01:50.4855686Z [132/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8i4bf16_rowwise.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8i4bf16_rowwise.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8i4bf16_rowwise.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8i4bf16_rowwise.cu.o 2025-05-07T20:01:50.4867948Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:01:50.4869512Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:50.4870665Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:50.4871113Z ^ 2025-05-07T20:01:50.4871318Z 2025-05-07T20:01:50.4871567Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:50.4871928Z 2025-05-07T20:01:50.4872745Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:50.4873913Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:01:50.4874397Z ^ 2025-05-07T20:01:50.4874569Z 2025-05-07T20:01:51.5882836Z [133/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_f.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_f.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_f.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_f.cu.o 2025-05-07T20:01:51.5895487Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:01:51.5897063Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:51.5898192Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:51.5898664Z ^ 2025-05-07T20:01:51.5898840Z 2025-05-07T20:01:51.5899080Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:51.5899453Z 2025-05-07T20:01:51.5900259Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:51.5901435Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:01:51.5901873Z ^ 2025-05-07T20:01:51.5902066Z 2025-05-07T20:01:51.5902846Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:51.5904085Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:51.5904532Z ^ 2025-05-07T20:01:51.5904825Z detected during: 2025-05-07T20:01:51.5919845Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:51.5947400Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:51.5975249Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:51.5991212Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_f.cu 2025-05-07T20:01:51.5992349Z 2025-05-07T20:01:51.5992591Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:51.5992947Z 2025-05-07T20:01:51.5993758Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:51.5994935Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:51.5995361Z ^ 2025-05-07T20:01:51.5995596Z detected during: 2025-05-07T20:01:51.6009281Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=13, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:51.6037077Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:51.6064637Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:51.6092478Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:51.6108364Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_f.cu 2025-05-07T20:01:51.6109635Z 2025-05-07T20:01:51.6110448Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:51.6111579Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:51.6112053Z ^ 2025-05-07T20:01:51.6112325Z detected during: 2025-05-07T20:01:51.6126760Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:51.6154264Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:51.6182169Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:51.6198025Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_f.cu 2025-05-07T20:01:51.6199141Z 2025-05-07T20:01:51.6199403Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:51.6199761Z 2025-05-07T20:01:51.6200545Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:51.6201673Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:51.6202085Z ^ 2025-05-07T20:01:51.6202349Z detected during: 2025-05-07T20:01:51.6215957Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=13, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:51.6243730Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:51.6271247Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:51.6299133Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:51.6315149Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_f.cu 2025-05-07T20:01:51.6316334Z 2025-05-07T20:01:51.6317120Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:51.6318271Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:51.6318738Z ^ 2025-05-07T20:01:51.6319007Z detected during: 2025-05-07T20:01:51.6333496Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:51.6360972Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:51.6388974Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:51.6404789Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_f.cu 2025-05-07T20:01:51.6405907Z 2025-05-07T20:01:51.6406176Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:51.6406532Z 2025-05-07T20:01:51.6407314Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:51.6408475Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:51.6409004Z ^ 2025-05-07T20:01:51.6409239Z detected during: 2025-05-07T20:01:51.6422783Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=13, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:51.6450533Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:51.6477889Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:51.6505699Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:51.6521632Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_f.cu 2025-05-07T20:01:51.6522743Z 2025-05-07T20:01:51.6523519Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:51.6524643Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:51.6525083Z ^ 2025-05-07T20:01:51.6525342Z detected during: 2025-05-07T20:01:51.6539678Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:51.6567079Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:51.6594887Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:51.6610743Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_f.cu 2025-05-07T20:01:51.6611851Z 2025-05-07T20:01:51.6612084Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:51.6612442Z 2025-05-07T20:01:51.6613215Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:51.6614368Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:51.6614757Z ^ 2025-05-07T20:01:51.6614985Z detected during: 2025-05-07T20:01:51.6628507Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=13, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:51.6656164Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:51.6683588Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:51.6711652Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:51.6727397Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_f.cu 2025-05-07T20:01:51.6728516Z 2025-05-07T20:01:51.6729292Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:51.6730415Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:51.6730841Z ^ 2025-05-07T20:01:51.6731104Z detected during: 2025-05-07T20:01:51.6745522Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:51.6772851Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:51.6800778Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:51.6816621Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_f.cu 2025-05-07T20:01:51.6817738Z 2025-05-07T20:01:51.6817969Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:51.6818316Z 2025-05-07T20:01:51.6819092Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:51.6820182Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:51.6820672Z ^ 2025-05-07T20:01:51.6820890Z detected during: 2025-05-07T20:01:51.6834390Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=13, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:51.6861985Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:51.6889343Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:51.6917995Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:51.6933914Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_f.cu 2025-05-07T20:01:51.6935070Z 2025-05-07T20:01:51.6935862Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:51.6936971Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:51.6937414Z ^ 2025-05-07T20:01:51.6937671Z detected during: 2025-05-07T20:01:51.6952192Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:51.6979518Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:51.7007974Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:51.7024018Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_f.cu 2025-05-07T20:01:51.7025123Z 2025-05-07T20:01:51.7025359Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:51.7025718Z 2025-05-07T20:01:51.7026500Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:51.7027589Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:51.7027981Z ^ 2025-05-07T20:01:51.7028209Z detected during: 2025-05-07T20:01:51.7041723Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=13, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=2, ClusterShape=cute::tuple>, TileShape_=cute::tuple, cute::C<128>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:51.7069447Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:51.7096778Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:51.7124907Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<128>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b4x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:51.7140664Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=128, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_f.cu 2025-05-07T20:01:51.7141829Z 2025-05-07T20:01:58.8562375Z [134/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_f.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_f.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_f.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_f.cu.o 2025-05-07T20:01:58.8574700Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:01:58.8576343Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:58.8577492Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:58.8577986Z ^ 2025-05-07T20:01:58.8578163Z 2025-05-07T20:01:58.8578407Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:58.8578785Z 2025-05-07T20:01:58.8579590Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:58.8580766Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:01:58.8581303Z ^ 2025-05-07T20:01:58.8581480Z 2025-05-07T20:01:58.8582291Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:58.8583504Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:58.8583968Z ^ 2025-05-07T20:01:58.8584237Z detected during: 2025-05-07T20:01:58.8598530Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:58.8626322Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:58.8654038Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:58.8669880Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_f.cu 2025-05-07T20:01:58.8671003Z 2025-05-07T20:01:58.8671272Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:58.8671626Z 2025-05-07T20:01:58.8672416Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:58.8673531Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:58.8673936Z ^ 2025-05-07T20:01:58.8674196Z detected during: 2025-05-07T20:01:58.8687858Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, cute::C<1>, cute::C<1>>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:58.8715832Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:58.8745311Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:58.8773025Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:58.8788782Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_f.cu 2025-05-07T20:01:58.8789898Z 2025-05-07T20:01:58.8790679Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:58.8791831Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:58.8792300Z ^ 2025-05-07T20:01:58.8792576Z detected during: 2025-05-07T20:01:58.8806804Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:58.8834075Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:58.8861567Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:58.8877223Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_f.cu 2025-05-07T20:01:58.8878329Z 2025-05-07T20:01:58.8878569Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:58.8878928Z 2025-05-07T20:01:58.8879706Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:58.8880841Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:58.8881231Z ^ 2025-05-07T20:01:58.8881462Z detected during: 2025-05-07T20:01:58.8894903Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, cute::C<1>, cute::C<1>>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:58.8928979Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:58.8956565Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:58.8984304Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:58.8999842Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_f.cu 2025-05-07T20:01:58.9000954Z 2025-05-07T20:01:58.9001788Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:58.9002931Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:58.9003374Z ^ 2025-05-07T20:01:58.9003631Z detected during: 2025-05-07T20:01:58.9018073Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:58.9045287Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:58.9074415Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:58.9089994Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_f.cu 2025-05-07T20:01:58.9091172Z 2025-05-07T20:01:58.9091423Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:58.9091779Z 2025-05-07T20:01:58.9092590Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:58.9093684Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:58.9094116Z ^ 2025-05-07T20:01:58.9094375Z detected during: 2025-05-07T20:01:58.9107933Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, cute::C<1>, cute::C<1>>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:58.9135905Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:58.9163101Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:58.9190853Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:58.9206782Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_f.cu 2025-05-07T20:01:58.9207951Z 2025-05-07T20:01:58.9208909Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:58.9210087Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:58.9210542Z ^ 2025-05-07T20:01:58.9210840Z detected during: 2025-05-07T20:01:58.9225291Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:58.9252537Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:58.9280150Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:58.9295918Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_f.cu 2025-05-07T20:01:58.9297051Z 2025-05-07T20:01:58.9297291Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:58.9297688Z 2025-05-07T20:01:58.9298498Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:58.9299644Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:58.9300068Z ^ 2025-05-07T20:01:58.9300307Z detected during: 2025-05-07T20:01:58.9314366Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, cute::C<1>, cute::C<1>>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:58.9341932Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:58.9370507Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:58.9398790Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:58.9414882Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_f.cu 2025-05-07T20:01:58.9416051Z 2025-05-07T20:01:58.9416861Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:58.9418059Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:58.9418522Z ^ 2025-05-07T20:01:58.9418799Z detected during: 2025-05-07T20:01:58.9433221Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:58.9460696Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:58.9488260Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:58.9503993Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_f.cu 2025-05-07T20:01:58.9505141Z 2025-05-07T20:01:58.9505388Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:58.9505764Z 2025-05-07T20:01:58.9506551Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:58.9507667Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:58.9508081Z ^ 2025-05-07T20:01:58.9508341Z detected during: 2025-05-07T20:01:58.9521945Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, cute::C<1>, cute::C<1>>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:58.9549706Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:58.9576622Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:58.9603997Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:58.9619701Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_f.cu 2025-05-07T20:01:58.9620837Z 2025-05-07T20:01:58.9621624Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:58.9622774Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:58.9623227Z ^ 2025-05-07T20:01:58.9623563Z detected during: 2025-05-07T20:01:58.9637801Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:58.9665050Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:58.9694340Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:58.9711963Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_f.cu 2025-05-07T20:01:58.9713159Z 2025-05-07T20:01:58.9713403Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:01:58.9713760Z 2025-05-07T20:01:58.9714605Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:01:58.9715704Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:01:58.9716197Z ^ 2025-05-07T20:01:58.9716454Z detected during: 2025-05-07T20:01:58.9730261Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, cute::C<1>, cute::C<1>>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:01:58.9759255Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:01:58.9786699Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:01:58.9814326Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:01:58.9830103Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_f.cu 2025-05-07T20:01:58.9831307Z 2025-05-07T20:02:02.0411060Z [135/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_f.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_f.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_f.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_f.cu.o 2025-05-07T20:02:02.0423502Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:02:02.0425043Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:02.0426221Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:02.0426690Z ^ 2025-05-07T20:02:02.0426864Z 2025-05-07T20:02:02.0427105Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:02.0427478Z 2025-05-07T20:02:02.0428278Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:02.0429435Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:02:02.0429877Z ^ 2025-05-07T20:02:02.0430050Z 2025-05-07T20:02:02.0430849Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:02.0432120Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:02.0432567Z ^ 2025-05-07T20:02:02.0432871Z detected during: 2025-05-07T20:02:02.0447102Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:02.0474225Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:02.0501698Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:02.0518722Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_f.cu 2025-05-07T20:02:02.0519873Z 2025-05-07T20:02:02.0520122Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:02.0520587Z 2025-05-07T20:02:02.0521406Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:02.0522511Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:02.0522946Z ^ 2025-05-07T20:02:02.0523188Z detected during: 2025-05-07T20:02:02.0536741Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, cute::C<4>, cute::C<1>>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:02:02.0564310Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:02.0591348Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:02.0618997Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:02.0634677Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_f.cu 2025-05-07T20:02:02.0635819Z 2025-05-07T20:02:02.0636602Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:02.0637756Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:02.0638201Z ^ 2025-05-07T20:02:02.0638504Z detected during: 2025-05-07T20:02:02.0652694Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:02.0679747Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:02.0707225Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:02.0722983Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_f.cu 2025-05-07T20:02:02.0724097Z 2025-05-07T20:02:02.0724363Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:02.0724719Z 2025-05-07T20:02:02.0725512Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:02.0726623Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:02.0727051Z ^ 2025-05-07T20:02:02.0727282Z detected during: 2025-05-07T20:02:02.0740849Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, cute::C<4>, cute::C<1>>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:02:02.0768380Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:02.0795368Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:02.0822887Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:02.0839700Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_f.cu 2025-05-07T20:02:02.0840816Z 2025-05-07T20:02:02.0841625Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:02.0842784Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:02.0843255Z ^ 2025-05-07T20:02:02.0843528Z detected during: 2025-05-07T20:02:02.0857712Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:02.0884696Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:02.0912369Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:02.0927974Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_f.cu 2025-05-07T20:02:02.0929091Z 2025-05-07T20:02:02.0929338Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:02.0929710Z 2025-05-07T20:02:02.0930500Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:02.0931620Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:02.0932023Z ^ 2025-05-07T20:02:02.0932275Z detected during: 2025-05-07T20:02:02.0945802Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, cute::C<4>, cute::C<1>>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:02:02.0973341Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:02.1000363Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:02.1028064Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:02.1043674Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_f.cu 2025-05-07T20:02:02.1044807Z 2025-05-07T20:02:02.1045593Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:02.1046733Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:02.1047186Z ^ 2025-05-07T20:02:02.1047476Z detected during: 2025-05-07T20:02:02.1061671Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:02.1088736Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:02.1116589Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:02.1132177Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_f.cu 2025-05-07T20:02:02.1133311Z 2025-05-07T20:02:02.1133555Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:02.1133907Z 2025-05-07T20:02:02.1134707Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:02.1135842Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:02.1136271Z ^ 2025-05-07T20:02:02.1136533Z detected during: 2025-05-07T20:02:02.1151154Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, cute::C<4>, cute::C<1>>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:02:02.1178671Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:02.1219065Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:02.1246971Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:02.1262596Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_f.cu 2025-05-07T20:02:02.1263792Z 2025-05-07T20:02:02.1264611Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:02.1265764Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:02.1266219Z ^ 2025-05-07T20:02:02.1266518Z detected during: 2025-05-07T20:02:02.1280678Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:02.1307863Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:02.1335546Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:02.1351188Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_f.cu 2025-05-07T20:02:02.1352354Z 2025-05-07T20:02:02.1352627Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:02.1352986Z 2025-05-07T20:02:02.1353781Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:02.1354904Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:02.1355337Z ^ 2025-05-07T20:02:02.1355571Z detected during: 2025-05-07T20:02:02.1369064Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, cute::C<4>, cute::C<1>>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:02:02.1396549Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:02.1423757Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:02.1451160Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:02.1466825Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_f.cu 2025-05-07T20:02:02.1467941Z 2025-05-07T20:02:02.1468750Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:02.1469880Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:02.1470351Z ^ 2025-05-07T20:02:02.1470630Z detected during: 2025-05-07T20:02:02.1485863Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:02.1513236Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:02.1540639Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:02.1556353Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_f.cu 2025-05-07T20:02:02.1557469Z 2025-05-07T20:02:02.1557744Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:02.1558125Z 2025-05-07T20:02:02.1558913Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:02.1560065Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:02.1560472Z ^ 2025-05-07T20:02:02.1560736Z detected during: 2025-05-07T20:02:02.1574199Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, cute::C<4>, cute::C<1>>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:02:02.1601723Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:02.1628943Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:02.1656388Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:02.1672049Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_f.cu 2025-05-07T20:02:02.1673189Z 2025-05-07T20:02:03.5789765Z [136/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_t.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_t.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_t.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_t.cu.o 2025-05-07T20:02:03.5802370Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:02:03.5804003Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:03.5805142Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:03.5805575Z ^ 2025-05-07T20:02:03.5805743Z 2025-05-07T20:02:03.5805997Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:03.5806344Z 2025-05-07T20:02:03.5807139Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:03.5808278Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:02:03.5808910Z ^ 2025-05-07T20:02:03.5809075Z 2025-05-07T20:02:03.5809866Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:03.5811020Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:03.5811470Z ^ 2025-05-07T20:02:03.5811771Z detected during: 2025-05-07T20:02:03.5826181Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:03.5853323Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:03.5880871Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:03.5896442Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_t.cu 2025-05-07T20:02:03.5897587Z 2025-05-07T20:02:03.5897825Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:03.5898185Z 2025-05-07T20:02:03.5898972Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:03.5900064Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:03.5900450Z ^ 2025-05-07T20:02:03.5900681Z detected during: 2025-05-07T20:02:03.5914461Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:02:03.5943859Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:03.5970841Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:03.5998286Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:03.6014029Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_t.cu 2025-05-07T20:02:03.6015159Z 2025-05-07T20:02:03.6015945Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:03.6017088Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:03.6017537Z ^ 2025-05-07T20:02:03.6017831Z detected during: 2025-05-07T20:02:03.6032202Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:03.6059277Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:03.6086769Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:03.6102341Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_t.cu 2025-05-07T20:02:03.6103549Z 2025-05-07T20:02:03.6103797Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:03.6104157Z 2025-05-07T20:02:03.6104973Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:03.6106077Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:03.6106538Z ^ 2025-05-07T20:02:03.6106783Z detected during: 2025-05-07T20:02:03.6120506Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:02:03.6148080Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:03.6175074Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:03.6202594Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:03.6218283Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_t.cu 2025-05-07T20:02:03.6219476Z 2025-05-07T20:02:03.6220267Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:03.6221395Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:03.6221855Z ^ 2025-05-07T20:02:03.6222143Z detected during: 2025-05-07T20:02:03.6236419Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:03.6264514Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:03.6291943Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:03.6307524Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_t.cu 2025-05-07T20:02:03.6308935Z 2025-05-07T20:02:03.6309216Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:03.6309576Z 2025-05-07T20:02:03.6310364Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:03.6311484Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:03.6311908Z ^ 2025-05-07T20:02:03.6312143Z detected during: 2025-05-07T20:02:03.6325636Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:02:03.6353175Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:03.6380164Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:03.6407626Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:03.6423396Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_t.cu 2025-05-07T20:02:03.6424509Z 2025-05-07T20:02:03.6425317Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:03.6426439Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:03.6426987Z ^ 2025-05-07T20:02:03.6427260Z detected during: 2025-05-07T20:02:03.6441491Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:03.6468574Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:03.6495977Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:03.6511752Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_t.cu 2025-05-07T20:02:03.6512872Z 2025-05-07T20:02:03.6513116Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:03.6513489Z 2025-05-07T20:02:03.6514278Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:03.6515399Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:03.6515809Z ^ 2025-05-07T20:02:03.6516064Z detected during: 2025-05-07T20:02:03.6529597Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:02:03.6557184Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:03.6585061Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:03.6612698Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:03.6628334Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_t.cu 2025-05-07T20:02:03.6629507Z 2025-05-07T20:02:03.6630292Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:03.6631436Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:03.6631881Z ^ 2025-05-07T20:02:03.6632178Z detected during: 2025-05-07T20:02:03.6646321Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:03.6673325Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:03.6700688Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:03.6716440Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_t.cu 2025-05-07T20:02:03.6717578Z 2025-05-07T20:02:03.6717821Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:03.6718175Z 2025-05-07T20:02:03.6718990Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:03.6720093Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:03.6720534Z ^ 2025-05-07T20:02:03.6720770Z detected during: 2025-05-07T20:02:03.6734317Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:02:03.6761809Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:03.6788824Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:03.6816529Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:03.6832197Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_t.cu 2025-05-07T20:02:03.6833326Z 2025-05-07T20:02:03.6834111Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:03.6835259Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:03.6835709Z ^ 2025-05-07T20:02:03.6835997Z detected during: 2025-05-07T20:02:03.6850220Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:03.6878162Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:03.6905660Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:03.6921414Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_t.cu 2025-05-07T20:02:03.6922551Z 2025-05-07T20:02:03.6922797Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:03.6923154Z 2025-05-07T20:02:03.6923967Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:03.6925191Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:03.6925621Z ^ 2025-05-07T20:02:03.6925838Z detected during: 2025-05-07T20:02:03.6939332Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:02:03.6966754Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:03.6993644Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:03.7021121Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:03.7036667Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_t.cu 2025-05-07T20:02:03.7037775Z 2025-05-07T20:02:05.0456248Z [137/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_t.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_t.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_t.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_t.cu.o 2025-05-07T20:02:05.0468721Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:02:05.0470281Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:05.0471483Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:05.0471946Z ^ 2025-05-07T20:02:05.0472117Z 2025-05-07T20:02:05.0472382Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:05.0472733Z 2025-05-07T20:02:05.0473535Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:05.0474693Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:02:05.0475132Z ^ 2025-05-07T20:02:05.0475325Z 2025-05-07T20:02:05.0476103Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:05.0477278Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:05.0477719Z ^ 2025-05-07T20:02:05.0478006Z detected during: 2025-05-07T20:02:05.0492234Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:05.0519587Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:05.0547181Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:05.0562783Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_t.cu 2025-05-07T20:02:05.0563955Z 2025-05-07T20:02:05.0564200Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:05.0564558Z 2025-05-07T20:02:05.0565367Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:05.0566458Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:05.0566877Z ^ 2025-05-07T20:02:05.0567109Z detected during: 2025-05-07T20:02:05.0580632Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, cute::C<4>, cute::C<1>>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:02:05.0608208Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:05.0637245Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:05.0664790Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:05.0680413Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_t.cu 2025-05-07T20:02:05.0681523Z 2025-05-07T20:02:05.0682338Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:05.0683467Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:05.0683939Z ^ 2025-05-07T20:02:05.0684217Z detected during: 2025-05-07T20:02:05.0698473Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:05.0725615Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:05.0753071Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:05.0768686Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_t.cu 2025-05-07T20:02:05.0769789Z 2025-05-07T20:02:05.0770053Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:05.0770409Z 2025-05-07T20:02:05.0771227Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:05.0772376Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:05.0772787Z ^ 2025-05-07T20:02:05.0773046Z detected during: 2025-05-07T20:02:05.0786600Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, cute::C<4>, cute::C<1>>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:02:05.0814208Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:05.0841232Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:05.0868671Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:05.0884231Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_t.cu 2025-05-07T20:02:05.0885337Z 2025-05-07T20:02:05.0886129Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:05.0887281Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:05.0887749Z ^ 2025-05-07T20:02:05.0888021Z detected during: 2025-05-07T20:02:05.0902221Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:05.0929435Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:05.0957962Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:05.0973576Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_t.cu 2025-05-07T20:02:05.0974719Z 2025-05-07T20:02:05.0974966Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:05.0975339Z 2025-05-07T20:02:05.0976123Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:05.0977238Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:05.0977641Z ^ 2025-05-07T20:02:05.0977902Z detected during: 2025-05-07T20:02:05.0991432Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, cute::C<4>, cute::C<1>>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:02:05.1019128Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:05.1046182Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:05.1073651Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:05.1089207Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_t.cu 2025-05-07T20:02:05.1090341Z 2025-05-07T20:02:05.1091128Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:05.1092281Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:05.1092737Z ^ 2025-05-07T20:02:05.1093033Z detected during: 2025-05-07T20:02:05.1107292Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:05.1134443Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:05.1161922Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:05.1177506Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_t.cu 2025-05-07T20:02:05.1178642Z 2025-05-07T20:02:05.1178891Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:05.1179253Z 2025-05-07T20:02:05.1180065Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:05.1181195Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:05.1181628Z ^ 2025-05-07T20:02:05.1181862Z detected during: 2025-05-07T20:02:05.1195453Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, cute::C<4>, cute::C<1>>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:02:05.1223090Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:05.1250105Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:05.1278649Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:05.1294201Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_t.cu 2025-05-07T20:02:05.1295344Z 2025-05-07T20:02:05.1296153Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:05.1297278Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:05.1297738Z ^ 2025-05-07T20:02:05.1298002Z detected during: 2025-05-07T20:02:05.1312406Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:05.1339370Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:05.1366767Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:05.1382400Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_t.cu 2025-05-07T20:02:05.1383551Z 2025-05-07T20:02:05.1383813Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:05.1384171Z 2025-05-07T20:02:05.1384952Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:05.1386067Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:05.1386469Z ^ 2025-05-07T20:02:05.1386725Z detected during: 2025-05-07T20:02:05.1400243Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, cute::C<4>, cute::C<1>>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:02:05.1429044Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:05.1456108Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:05.1483657Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:05.1499306Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_t.cu 2025-05-07T20:02:05.1500414Z 2025-05-07T20:02:05.1501200Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:05.1502391Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:05.1502865Z ^ 2025-05-07T20:02:05.1503175Z detected during: 2025-05-07T20:02:05.1517711Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:05.1544803Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:05.1573515Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:05.1589343Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_t.cu 2025-05-07T20:02:05.1590459Z 2025-05-07T20:02:05.1590708Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:05.1591084Z 2025-05-07T20:02:05.1591876Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:05.1593001Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:05.1593419Z ^ 2025-05-07T20:02:05.1593679Z detected during: 2025-05-07T20:02:05.1607213Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, cute::C<4>, cute::C<1>>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:02:05.1634990Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:05.1662001Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:05.1689483Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:05.1705156Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_t.cu 2025-05-07T20:02:05.1706321Z 2025-05-07T20:02:05.1717690Z [138/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_t.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_t.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_t.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_t.cu.o 2025-05-07T20:02:05.1729690Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:02:05.1731297Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:05.1732428Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:05.1732892Z ^ 2025-05-07T20:02:05.1733066Z 2025-05-07T20:02:05.1733308Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:05.1733681Z 2025-05-07T20:02:05.1734484Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:05.1735672Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:02:05.1736110Z ^ 2025-05-07T20:02:05.1736359Z 2025-05-07T20:02:05.1737139Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:05.1738247Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:05.1738714Z ^ 2025-05-07T20:02:05.1739012Z detected during: 2025-05-07T20:02:05.1753165Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:05.1779896Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:05.1807257Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:05.1822852Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_t.cu 2025-05-07T20:02:05.1824049Z 2025-05-07T20:02:05.1824323Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:05.1824676Z 2025-05-07T20:02:05.1825463Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:05.1826581Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:05.1827009Z ^ 2025-05-07T20:02:05.1827249Z detected during: 2025-05-07T20:02:05.1840782Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:02:05.1868126Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:05.1895852Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:05.1923332Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:05.1938853Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_t.cu 2025-05-07T20:02:05.1939964Z 2025-05-07T20:02:05.1940780Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:05.1941909Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:05.1942379Z ^ 2025-05-07T20:02:05.1942648Z detected during: 2025-05-07T20:02:05.1956824Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:05.1983617Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:05.2010945Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:05.2026481Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_t.cu 2025-05-07T20:02:05.2027591Z 2025-05-07T20:02:05.2027835Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:05.2028218Z 2025-05-07T20:02:05.2029006Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:05.2030129Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:05.2030567Z ^ 2025-05-07T20:02:05.2030832Z detected during: 2025-05-07T20:02:05.2044223Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:02:05.2071557Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:05.2098293Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:05.2125763Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:05.2141254Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_t.cu 2025-05-07T20:02:05.2142385Z 2025-05-07T20:02:05.2143176Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:05.2144365Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:05.2144817Z ^ 2025-05-07T20:02:05.2145143Z detected during: 2025-05-07T20:02:05.2159194Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:05.2186035Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:05.2214585Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:05.2230226Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_t.cu 2025-05-07T20:02:05.2231362Z 2025-05-07T20:02:05.2231607Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:05.2232001Z 2025-05-07T20:02:05.2232823Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:05.2233915Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:05.2234344Z ^ 2025-05-07T20:02:05.2234578Z detected during: 2025-05-07T20:02:05.2247996Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:02:05.2275322Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:05.2302130Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:05.2329589Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:05.2345133Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_t.cu 2025-05-07T20:02:05.2346262Z 2025-05-07T20:02:05.2347081Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:05.2348220Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:05.2348671Z ^ 2025-05-07T20:02:05.2348970Z detected during: 2025-05-07T20:02:05.2363043Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:05.2389805Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:05.2417182Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:05.2432755Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_t.cu 2025-05-07T20:02:05.2433897Z 2025-05-07T20:02:05.2434162Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:05.2434516Z 2025-05-07T20:02:05.2435299Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:05.2436411Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:05.2436831Z ^ 2025-05-07T20:02:05.2437063Z detected during: 2025-05-07T20:02:05.2450452Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:02:05.2477867Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:05.2504668Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:05.2532860Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:05.2548459Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_t.cu 2025-05-07T20:02:05.2549569Z 2025-05-07T20:02:05.2550375Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:05.2551497Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:05.2551942Z ^ 2025-05-07T20:02:05.2552196Z detected during: 2025-05-07T20:02:05.2566237Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:05.2592949Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:05.2620274Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:05.2635761Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_t.cu 2025-05-07T20:02:05.2636870Z 2025-05-07T20:02:05.2637104Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:05.2637463Z 2025-05-07T20:02:05.2638287Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:05.2639370Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:05.2639805Z ^ 2025-05-07T20:02:05.2640042Z detected during: 2025-05-07T20:02:05.2653404Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:02:05.2680641Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:05.2707418Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:05.2734712Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:05.2750158Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_t.cu 2025-05-07T20:02:05.2751270Z 2025-05-07T20:02:05.2752048Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:05.2753173Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:05.2753604Z ^ 2025-05-07T20:02:05.2753868Z detected during: 2025-05-07T20:02:05.2767879Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:05.2794570Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:05.2821866Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:05.2838295Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_t.cu 2025-05-07T20:02:05.2839410Z 2025-05-07T20:02:05.2839648Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:05.2840022Z 2025-05-07T20:02:05.2840813Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:05.2841890Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:05.2842293Z ^ 2025-05-07T20:02:05.2842513Z detected during: 2025-05-07T20:02:05.2855884Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:02:05.2883110Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:05.2909946Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:05.2937125Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:05.2952641Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_t.cu 2025-05-07T20:02:05.2953743Z 2025-05-07T20:02:11.1198277Z [139/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_f.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_f.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_f.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_f.cu.o 2025-05-07T20:02:11.1210880Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:02:11.1212563Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:11.1213716Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:11.1214173Z ^ 2025-05-07T20:02:11.1214375Z 2025-05-07T20:02:11.1214619Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:11.1214975Z 2025-05-07T20:02:11.1215788Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:11.1216953Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:02:11.1217430Z ^ 2025-05-07T20:02:11.1217607Z 2025-05-07T20:02:11.1218391Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:11.1219537Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:11.1220000Z ^ 2025-05-07T20:02:11.1220263Z detected during: 2025-05-07T20:02:11.1234636Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:11.1261895Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:11.1289420Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:11.1316558Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_f.cu 2025-05-07T20:02:11.1318073Z 2025-05-07T20:02:11.1318351Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:11.1318715Z 2025-05-07T20:02:11.1319531Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:11.1320634Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:11.1321080Z ^ 2025-05-07T20:02:11.1321324Z detected during: 2025-05-07T20:02:11.1334950Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:02:11.1364126Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:11.1391138Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:11.1418737Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:11.1434398Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_f.cu 2025-05-07T20:02:11.1435517Z 2025-05-07T20:02:11.1436338Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:11.1437469Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:11.1437945Z ^ 2025-05-07T20:02:11.1438273Z detected during: 2025-05-07T20:02:11.1452446Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:11.1479436Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:11.1506898Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:11.1522657Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_f.cu 2025-05-07T20:02:11.1523780Z 2025-05-07T20:02:11.1524057Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:11.1524416Z 2025-05-07T20:02:11.1525204Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:11.1526414Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:11.1526826Z ^ 2025-05-07T20:02:11.1527097Z detected during: 2025-05-07T20:02:11.1540627Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:02:11.1568167Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:11.1595159Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:11.1622834Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:11.1638498Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_f.cu 2025-05-07T20:02:11.1639649Z 2025-05-07T20:02:11.1640439Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:11.1641596Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:11.1642064Z ^ 2025-05-07T20:02:11.1642336Z detected during: 2025-05-07T20:02:11.1657298Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:11.1684399Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:11.1712063Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:11.1727678Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_f.cu 2025-05-07T20:02:11.1728797Z 2025-05-07T20:02:11.1729046Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:11.1729430Z 2025-05-07T20:02:11.1730220Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:11.1731339Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:11.1731751Z ^ 2025-05-07T20:02:11.1732009Z detected during: 2025-05-07T20:02:11.1745566Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:02:11.1773032Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:11.1800012Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:11.1827661Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:11.1843238Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_f.cu 2025-05-07T20:02:11.1844385Z 2025-05-07T20:02:11.1845170Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:11.1846339Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:11.1846789Z ^ 2025-05-07T20:02:11.1847109Z detected during: 2025-05-07T20:02:11.1861270Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:11.1888140Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:11.1915673Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:11.1931174Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_f.cu 2025-05-07T20:02:11.1932298Z 2025-05-07T20:02:11.1932536Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:11.1932883Z 2025-05-07T20:02:11.1933676Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:11.1934749Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:11.1935149Z ^ 2025-05-07T20:02:11.1935364Z detected during: 2025-05-07T20:02:11.1948868Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:02:11.1977051Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:11.2003986Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:11.2031699Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:11.2047308Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_f.cu 2025-05-07T20:02:11.2048460Z 2025-05-07T20:02:11.2049268Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:11.2050397Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:11.2050869Z ^ 2025-05-07T20:02:11.2051163Z detected during: 2025-05-07T20:02:11.2065358Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:11.2092260Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:11.2119724Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:11.2135231Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_f.cu 2025-05-07T20:02:11.2136343Z 2025-05-07T20:02:11.2136592Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:11.2136937Z 2025-05-07T20:02:11.2137714Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:11.2138802Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:11.2139190Z ^ 2025-05-07T20:02:11.2139423Z detected during: 2025-05-07T20:02:11.2152949Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:02:11.2180399Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:11.2207338Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:11.2234907Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:11.2250485Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_f.cu 2025-05-07T20:02:11.2251598Z 2025-05-07T20:02:11.2252382Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:11.2253525Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:11.2253995Z ^ 2025-05-07T20:02:11.2254259Z detected during: 2025-05-07T20:02:11.2268484Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:11.2296215Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:11.2323864Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:11.2339453Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_f.cu 2025-05-07T20:02:11.2340571Z 2025-05-07T20:02:11.2340817Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:11.2341227Z 2025-05-07T20:02:11.2342015Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:11.2343124Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:11.2343578Z ^ 2025-05-07T20:02:11.2343826Z detected during: 2025-05-07T20:02:11.2357287Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple>, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:02:11.2384775Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:11.2411988Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:11.2439425Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:11.2454977Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_f.cu 2025-05-07T20:02:11.2456113Z 2025-05-07T20:02:14.1144512Z [140/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_f.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_f.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_f.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_f.cu.o 2025-05-07T20:02:14.1156948Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:02:14.1158622Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:14.1159779Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:14.1160227Z ^ 2025-05-07T20:02:14.1160401Z 2025-05-07T20:02:14.1160674Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:14.1161028Z 2025-05-07T20:02:14.1161832Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:14.1163009Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:02:14.1163479Z ^ 2025-05-07T20:02:14.1163655Z 2025-05-07T20:02:14.1164473Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:14.1165613Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:14.1166056Z ^ 2025-05-07T20:02:14.1166341Z detected during: 2025-05-07T20:02:14.1180502Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:14.1207441Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:14.1234960Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:14.1250495Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_f.cu 2025-05-07T20:02:14.1251682Z 2025-05-07T20:02:14.1251927Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:14.1252282Z 2025-05-07T20:02:14.1253085Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:14.1254183Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:14.1254608Z ^ 2025-05-07T20:02:14.1254863Z detected during: 2025-05-07T20:02:14.1268325Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:02:14.1296878Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:14.1324049Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:14.1354377Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:14.1369990Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_f.cu 2025-05-07T20:02:14.1371135Z 2025-05-07T20:02:14.1371932Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:14.1373078Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:14.1373527Z ^ 2025-05-07T20:02:14.1373833Z detected during: 2025-05-07T20:02:14.1387950Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:14.1414933Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:14.1442212Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:14.1457696Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_f.cu 2025-05-07T20:02:14.1458838Z 2025-05-07T20:02:14.1459081Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:14.1459436Z 2025-05-07T20:02:14.1460274Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:14.1461372Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:14.1461823Z ^ 2025-05-07T20:02:14.1462057Z detected during: 2025-05-07T20:02:14.1475525Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:02:14.1502805Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:14.1529828Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:14.1557179Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:14.1572638Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_f.cu 2025-05-07T20:02:14.1573774Z 2025-05-07T20:02:14.1574561Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:14.1575701Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:14.1576145Z ^ 2025-05-07T20:02:14.1576439Z detected during: 2025-05-07T20:02:14.1590548Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:14.1618172Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:14.1645382Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:14.1660828Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_f.cu 2025-05-07T20:02:14.1661949Z 2025-05-07T20:02:14.1662185Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:14.1662558Z 2025-05-07T20:02:14.1663377Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:14.1664463Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:14.1664865Z ^ 2025-05-07T20:02:14.1665115Z detected during: 2025-05-07T20:02:14.1678481Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:02:14.1705709Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:14.1732621Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:14.1759803Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:14.1775277Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_f.cu 2025-05-07T20:02:14.1776379Z 2025-05-07T20:02:14.1777168Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:14.1778279Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:14.1778720Z ^ 2025-05-07T20:02:14.1778972Z detected during: 2025-05-07T20:02:14.1793043Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:14.1819804Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:14.1847013Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:14.1862424Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_f.cu 2025-05-07T20:02:14.1863599Z 2025-05-07T20:02:14.1863847Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:14.1864191Z 2025-05-07T20:02:14.1865002Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:14.1866087Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:14.1866491Z ^ 2025-05-07T20:02:14.1866710Z detected during: 2025-05-07T20:02:14.1880068Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:02:14.1908021Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:14.1934840Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:14.1962008Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:14.1977405Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_f.cu 2025-05-07T20:02:14.1978512Z 2025-05-07T20:02:14.1979296Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:14.1980402Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:14.1980841Z ^ 2025-05-07T20:02:14.1981090Z detected during: 2025-05-07T20:02:14.1995166Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:14.2021984Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:14.2049185Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:14.2064659Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_f.cu 2025-05-07T20:02:14.2065795Z 2025-05-07T20:02:14.2066028Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:14.2066384Z 2025-05-07T20:02:14.2067160Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:14.2068251Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:14.2068637Z ^ 2025-05-07T20:02:14.2068889Z detected during: 2025-05-07T20:02:14.2082314Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:02:14.2109632Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:14.2136360Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:14.2163532Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:14.2178933Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_f.cu 2025-05-07T20:02:14.2180052Z 2025-05-07T20:02:14.2180825Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:14.2181947Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:14.2182371Z ^ 2025-05-07T20:02:14.2182632Z detected during: 2025-05-07T20:02:14.2196672Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:14.2224477Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:14.2251652Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:14.2267115Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_f.cu 2025-05-07T20:02:14.2268233Z 2025-05-07T20:02:14.2268469Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:14.2268811Z 2025-05-07T20:02:14.2269634Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:14.2270709Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:14.2271112Z ^ 2025-05-07T20:02:14.2271328Z detected during: 2025-05-07T20:02:14.2284690Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=6, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:02:14.2312117Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:14.2338785Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:14.2365941Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:14.2381361Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_f.cu 2025-05-07T20:02:14.2382496Z 2025-05-07T20:02:19.1854139Z [141/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_f.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_f.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_f.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_f.cu.o 2025-05-07T20:02:19.1866839Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:02:19.1868425Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:19.1869558Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:19.1870035Z ^ 2025-05-07T20:02:19.1870214Z 2025-05-07T20:02:19.1870483Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:19.1870837Z 2025-05-07T20:02:19.1871655Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:19.1872823Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:02:19.1873265Z ^ 2025-05-07T20:02:19.1873461Z 2025-05-07T20:02:19.1874247Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:19.1875373Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:19.1875798Z ^ 2025-05-07T20:02:19.1876065Z detected during: 2025-05-07T20:02:19.1890155Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:19.1918272Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:19.1945416Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:19.1960830Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_f.cu 2025-05-07T20:02:19.1961987Z 2025-05-07T20:02:19.1962224Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:19.1962587Z 2025-05-07T20:02:19.1963368Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:19.1964465Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:19.1964860Z ^ 2025-05-07T20:02:19.1965099Z detected during: 2025-05-07T20:02:19.1978452Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=4, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:02:19.2005686Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:19.2032478Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:19.2059558Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:19.2074966Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_f.cu 2025-05-07T20:02:19.2076088Z 2025-05-07T20:02:19.2076908Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:19.2078059Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:19.2078502Z ^ 2025-05-07T20:02:19.2078800Z detected during: 2025-05-07T20:02:19.2092902Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:19.2119836Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:19.2147135Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:19.2162596Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_f.cu 2025-05-07T20:02:19.2163735Z 2025-05-07T20:02:19.2163977Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:19.2164355Z 2025-05-07T20:02:19.2165144Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:19.2166242Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:19.2166672Z ^ 2025-05-07T20:02:19.2166936Z detected during: 2025-05-07T20:02:19.2180295Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=4, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:02:19.2208342Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:19.2235299Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:19.2262461Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:19.2277975Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_f.cu 2025-05-07T20:02:19.2279116Z 2025-05-07T20:02:19.2279903Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:19.2281093Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:19.2281543Z ^ 2025-05-07T20:02:19.2281897Z detected during: 2025-05-07T20:02:19.2295914Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:19.2322836Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:19.2350079Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:19.2365508Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_f.cu 2025-05-07T20:02:19.2366645Z 2025-05-07T20:02:19.2366888Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:19.2367243Z 2025-05-07T20:02:19.2368031Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:19.2369153Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:19.2369586Z ^ 2025-05-07T20:02:19.2369825Z detected during: 2025-05-07T20:02:19.2383218Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=4, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:02:19.2410671Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:19.2437386Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:19.2464546Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:19.2480016Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_f.cu 2025-05-07T20:02:19.2481167Z 2025-05-07T20:02:19.2482391Z ptxas /tmp/tmpxft_00007e79_00000000-9_f4f4bf16_128_256_2_1_1_f.compute_90.ptx, line 835; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:02:19.2484882Z ptxas /tmp/tmpxft_00007e79_00000000-9_f4f4bf16_128_256_2_1_1_f.compute_90.ptx, line 848; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:02:19.2487383Z ptxas /tmp/tmpxft_00007e79_00000000-9_f4f4bf16_128_256_2_1_1_f.compute_90.ptx, line 988; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:02:19.2489893Z ptxas /tmp/tmpxft_00007e79_00000000-9_f4f4bf16_128_256_2_1_1_f.compute_90.ptx, line 1001; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:02:19.2491973Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:19.2493102Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:19.2493566Z ^ 2025-05-07T20:02:19.2493835Z detected during: 2025-05-07T20:02:19.2507952Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:19.2535502Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:19.2562698Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:19.2578147Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_f.cu 2025-05-07T20:02:19.2579265Z 2025-05-07T20:02:19.2579530Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:19.2579888Z 2025-05-07T20:02:19.2580673Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:19.2581822Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:19.2582233Z ^ 2025-05-07T20:02:19.2582492Z detected during: 2025-05-07T20:02:19.2595943Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=4, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:02:19.2623322Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:19.2650016Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:19.2677165Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:19.2692643Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_f.cu 2025-05-07T20:02:19.2693762Z 2025-05-07T20:02:19.2694549Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:19.2695704Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:19.2696172Z ^ 2025-05-07T20:02:19.2696449Z detected during: 2025-05-07T20:02:19.2710737Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:19.2737443Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:19.2764659Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:19.2780108Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_f.cu 2025-05-07T20:02:19.2781224Z 2025-05-07T20:02:19.2781470Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:19.2781876Z 2025-05-07T20:02:19.2782660Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:19.2783862Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:19.2784269Z ^ 2025-05-07T20:02:19.2784536Z detected during: 2025-05-07T20:02:19.2797879Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=4, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:02:19.2825249Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:19.2852624Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:19.2879859Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:19.2895297Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_f.cu 2025-05-07T20:02:19.2896431Z 2025-05-07T20:02:19.2897215Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:19.2898362Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:19.2898803Z ^ 2025-05-07T20:02:19.2899092Z detected during: 2025-05-07T20:02:19.2913340Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:19.2940083Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:19.2967240Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:19.2982699Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_f.cu 2025-05-07T20:02:19.2983881Z 2025-05-07T20:02:19.2984127Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:19.2984513Z 2025-05-07T20:02:19.2985324Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:19.2986421Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:19.2986879Z ^ 2025-05-07T20:02:19.2987112Z detected during: 2025-05-07T20:02:19.3000489Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=4, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:02:19.3027916Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:19.3054571Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:19.3081746Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:19.3097155Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_f.cu 2025-05-07T20:02:19.3098267Z 2025-05-07T20:02:20.5180522Z [142/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_f.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_f.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_f.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_f.cu.o 2025-05-07T20:02:20.5193054Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:02:20.5194704Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:20.5195854Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:20.5196299Z ^ 2025-05-07T20:02:20.5196498Z 2025-05-07T20:02:20.5196742Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:20.5197096Z 2025-05-07T20:02:20.5197927Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:20.5199070Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:02:20.5199537Z ^ 2025-05-07T20:02:20.5199710Z 2025-05-07T20:02:20.5200500Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:20.5201645Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:20.5202112Z ^ 2025-05-07T20:02:20.5202386Z detected during: 2025-05-07T20:02:20.5217039Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:20.5245940Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:20.5273776Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:20.5289568Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=192, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_f.cu 2025-05-07T20:02:20.5290712Z 2025-05-07T20:02:20.5290993Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:20.5291376Z 2025-05-07T20:02:20.5292167Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:20.5293317Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:20.5293764Z ^ 2025-05-07T20:02:20.5294052Z detected during: 2025-05-07T20:02:20.5308500Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:20.5336046Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:20.5363909Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:20.5379674Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=192, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_f.cu 2025-05-07T20:02:20.5380807Z 2025-05-07T20:02:20.5381052Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:20.5381408Z 2025-05-07T20:02:20.5382195Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:20.5383383Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:20.5383860Z ^ 2025-05-07T20:02:20.5384128Z detected during: 2025-05-07T20:02:20.5398539Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:20.5426193Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:20.5454021Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:20.5469960Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=192, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_f.cu 2025-05-07T20:02:20.5471085Z 2025-05-07T20:02:20.5471333Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:20.5471710Z 2025-05-07T20:02:20.5472912Z ptxas /tmp/tmpxft_00007e6b_00000000-9_f4f4bf16_128_192_2_2_1_f.compute_90.ptx, line 889; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:02:20.5475412Z ptxas /tmp/tmpxft_00007e6b_00000000-9_f4f4bf16_128_192_2_2_1_f.compute_90.ptx, line 896; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:02:20.5477891Z ptxas /tmp/tmpxft_00007e6b_00000000-9_f4f4bf16_128_192_2_2_1_f.compute_90.ptx, line 903; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:02:20.5480351Z ptxas /tmp/tmpxft_00007e6b_00000000-9_f4f4bf16_128_192_2_2_1_f.compute_90.ptx, line 910; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:02:20.5482838Z ptxas /tmp/tmpxft_00007e6b_00000000-9_f4f4bf16_128_192_2_2_1_f.compute_90.ptx, line 1044; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:02:20.5485363Z ptxas /tmp/tmpxft_00007e6b_00000000-9_f4f4bf16_128_192_2_2_1_f.compute_90.ptx, line 1051; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:02:20.5487849Z ptxas /tmp/tmpxft_00007e6b_00000000-9_f4f4bf16_128_192_2_2_1_f.compute_90.ptx, line 1058; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:02:20.5490359Z ptxas /tmp/tmpxft_00007e6b_00000000-9_f4f4bf16_128_192_2_2_1_f.compute_90.ptx, line 1065; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:02:20.5492446Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:20.5493594Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:20.5494041Z ^ 2025-05-07T20:02:20.5494332Z detected during: 2025-05-07T20:02:20.5508951Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:20.5537251Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:20.5565132Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:20.5580882Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=192, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_f.cu 2025-05-07T20:02:20.5582020Z 2025-05-07T20:02:20.5582266Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:20.5582622Z 2025-05-07T20:02:20.5583473Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:20.5584604Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:20.5585080Z ^ 2025-05-07T20:02:20.5585380Z detected during: 2025-05-07T20:02:20.5599780Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:20.5627377Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:20.5655221Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:20.5671073Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=192, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_f.cu 2025-05-07T20:02:20.5672192Z 2025-05-07T20:02:20.5672460Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:20.5672813Z 2025-05-07T20:02:20.5673606Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:20.5674754Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:20.5675224Z ^ 2025-05-07T20:02:20.5675503Z detected during: 2025-05-07T20:02:20.5689874Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:20.5717436Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:20.5745339Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:20.5761126Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=192, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_f.cu 2025-05-07T20:02:20.5762270Z 2025-05-07T20:02:20.5762513Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:20.5762887Z 2025-05-07T20:02:22.8435457Z [143/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_t.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_t.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_t.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_t.cu.o 2025-05-07T20:02:22.8460071Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:02:22.8461635Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:22.8462810Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:22.8463383Z ^ 2025-05-07T20:02:22.8463568Z 2025-05-07T20:02:22.8463821Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:22.8464206Z 2025-05-07T20:02:22.8465143Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:22.8466444Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:02:22.8466892Z ^ 2025-05-07T20:02:22.8467072Z 2025-05-07T20:02:22.8467882Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:22.8469073Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:22.8469548Z ^ 2025-05-07T20:02:22.8469820Z detected during: 2025-05-07T20:02:22.8484330Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:22.8512058Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:22.8541007Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:22.8556905Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_t.cu 2025-05-07T20:02:22.8558022Z 2025-05-07T20:02:22.8558292Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:22.8558650Z 2025-05-07T20:02:22.8559443Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:22.8560596Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:22.8561053Z ^ 2025-05-07T20:02:22.8561352Z detected during: 2025-05-07T20:02:22.8575834Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:22.8603333Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:22.8631446Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:22.8647381Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_t.cu 2025-05-07T20:02:22.8648534Z 2025-05-07T20:02:22.8648780Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:22.8649137Z 2025-05-07T20:02:22.8649955Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:22.8651088Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:22.8651561Z ^ 2025-05-07T20:02:22.8651872Z detected during: 2025-05-07T20:02:22.8666375Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:22.8693809Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:22.8721967Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:22.8737880Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_t.cu 2025-05-07T20:02:22.8738997Z 2025-05-07T20:02:22.8739271Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:22.8739664Z 2025-05-07T20:02:22.8740456Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:22.8741612Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:22.8742091Z ^ 2025-05-07T20:02:22.8742404Z detected during: 2025-05-07T20:02:22.8756891Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:22.8784385Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:22.8812418Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:22.8828323Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_t.cu 2025-05-07T20:02:22.8829434Z 2025-05-07T20:02:22.8829684Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:22.8830067Z 2025-05-07T20:02:22.8830854Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:22.8832006Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:22.8832460Z ^ 2025-05-07T20:02:22.8832760Z detected during: 2025-05-07T20:02:22.8847872Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:22.8875528Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:22.8903587Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:22.8919533Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_t.cu 2025-05-07T20:02:22.8920679Z 2025-05-07T20:02:22.8920985Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:22.8921343Z 2025-05-07T20:02:22.8922138Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:22.8923286Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:22.8923756Z ^ 2025-05-07T20:02:22.8924017Z detected during: 2025-05-07T20:02:22.8938434Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:22.8965842Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:22.8993687Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:22.9009557Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_t.cu 2025-05-07T20:02:22.9010664Z 2025-05-07T20:02:22.9010898Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:22.9011307Z 2025-05-07T20:02:22.9369982Z [144/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_t.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_t.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_t.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_t.cu.o 2025-05-07T20:02:22.9382300Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:02:22.9383957Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:22.9385128Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:22.9385599Z ^ 2025-05-07T20:02:22.9385767Z 2025-05-07T20:02:22.9386008Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:22.9386364Z 2025-05-07T20:02:22.9387195Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:22.9388316Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:02:22.9388753Z ^ 2025-05-07T20:02:22.9388916Z 2025-05-07T20:02:22.9389766Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:22.9390865Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:22.9391343Z ^ 2025-05-07T20:02:22.9391595Z detected during: 2025-05-07T20:02:22.9406024Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:22.9433733Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:22.9461656Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:22.9477534Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_t.cu 2025-05-07T20:02:22.9478645Z 2025-05-07T20:02:22.9478912Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:22.9479270Z 2025-05-07T20:02:22.9480058Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:22.9481205Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:22.9481665Z ^ 2025-05-07T20:02:22.9481934Z detected during: 2025-05-07T20:02:22.9496355Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:22.9523999Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:22.9552204Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:22.9568053Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_t.cu 2025-05-07T20:02:22.9569220Z 2025-05-07T20:02:22.9569465Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:22.9569844Z 2025-05-07T20:02:22.9570635Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:22.9571788Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:22.9572251Z ^ 2025-05-07T20:02:22.9572538Z detected during: 2025-05-07T20:02:22.9587022Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:22.9615391Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:22.9643419Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:22.9659396Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_t.cu 2025-05-07T20:02:22.9660509Z 2025-05-07T20:02:22.9660775Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:22.9661131Z 2025-05-07T20:02:22.9661918Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:22.9663111Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:22.9663655Z ^ 2025-05-07T20:02:22.9663931Z detected during: 2025-05-07T20:02:22.9678377Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:22.9705949Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:22.9734050Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:22.9749965Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_t.cu 2025-05-07T20:02:22.9751074Z 2025-05-07T20:02:22.9751318Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:22.9751692Z 2025-05-07T20:02:22.9752487Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:22.9753625Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:22.9754068Z ^ 2025-05-07T20:02:22.9754351Z detected during: 2025-05-07T20:02:22.9768744Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:22.9796386Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:22.9824538Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:22.9840405Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_t.cu 2025-05-07T20:02:22.9841563Z 2025-05-07T20:02:22.9841804Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:22.9842157Z 2025-05-07T20:02:22.9842963Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:22.9844084Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:22.9844552Z ^ 2025-05-07T20:02:22.9844820Z detected during: 2025-05-07T20:02:22.9859255Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:22.9886714Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:22.9914810Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:22.9931372Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_t.cu 2025-05-07T20:02:22.9932486Z 2025-05-07T20:02:22.9932733Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:22.9933112Z 2025-05-07T20:02:23.7396608Z [145/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_f.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_f.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_f.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_f.cu.o 2025-05-07T20:02:23.7409257Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:02:23.7410813Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:23.7411963Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:23.7412433Z ^ 2025-05-07T20:02:23.7412608Z 2025-05-07T20:02:23.7412852Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:23.7413226Z 2025-05-07T20:02:23.7414036Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:23.7415172Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:02:23.7415631Z ^ 2025-05-07T20:02:23.7415804Z 2025-05-07T20:02:23.7416606Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:23.7417726Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:23.7418192Z ^ 2025-05-07T20:02:23.7418454Z detected during: 2025-05-07T20:02:23.7433237Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:23.7460809Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:23.7488844Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:23.7504822Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_f.cu 2025-05-07T20:02:23.7505951Z 2025-05-07T20:02:23.7506220Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:23.7506573Z 2025-05-07T20:02:23.7507363Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:23.7508521Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:23.7509113Z ^ 2025-05-07T20:02:23.7509413Z detected during: 2025-05-07T20:02:23.7523921Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:23.7551414Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:23.7579400Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:23.7595248Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_f.cu 2025-05-07T20:02:23.7596429Z 2025-05-07T20:02:23.7596675Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:23.7597025Z 2025-05-07T20:02:23.7597869Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:23.7598993Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:23.7599465Z ^ 2025-05-07T20:02:23.7599759Z detected during: 2025-05-07T20:02:23.7614360Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:23.7641881Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:23.7669864Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:23.7685658Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_f.cu 2025-05-07T20:02:23.7686785Z 2025-05-07T20:02:23.7687018Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:23.7687371Z 2025-05-07T20:02:23.7688161Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:23.7689262Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:23.7689703Z ^ 2025-05-07T20:02:23.7689956Z detected during: 2025-05-07T20:02:23.7704463Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:23.7731977Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:23.7759822Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:23.7775635Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_f.cu 2025-05-07T20:02:23.7776741Z 2025-05-07T20:02:23.7776987Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:23.7777334Z 2025-05-07T20:02:23.7778116Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:23.7779231Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:23.7779659Z ^ 2025-05-07T20:02:23.7779927Z detected during: 2025-05-07T20:02:23.7794358Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:23.7821888Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:23.7849703Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:23.7865514Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_f.cu 2025-05-07T20:02:23.7866659Z 2025-05-07T20:02:23.7866894Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:23.7867240Z 2025-05-07T20:02:23.7868028Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:23.7869159Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:23.7869599Z ^ 2025-05-07T20:02:23.7869865Z detected during: 2025-05-07T20:02:23.7884296Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:23.7911869Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:23.7939827Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<4>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:23.7955707Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=2, TBS_N=4, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_f.cu 2025-05-07T20:02:23.7956821Z 2025-05-07T20:02:23.7957061Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:23.7957440Z 2025-05-07T20:02:24.7532003Z [146/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_f.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_f.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_f.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_f.cu.o 2025-05-07T20:02:24.7544612Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:02:24.7546158Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:24.7547308Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:24.7547787Z ^ 2025-05-07T20:02:24.7547967Z 2025-05-07T20:02:24.7548215Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:24.7548571Z 2025-05-07T20:02:24.7549399Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:24.7550536Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:02:24.7551006Z ^ 2025-05-07T20:02:24.7551181Z 2025-05-07T20:02:24.7551989Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:24.7553105Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:24.7553567Z ^ 2025-05-07T20:02:24.7553833Z detected during: 2025-05-07T20:02:24.7568318Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:24.7595840Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:24.7624035Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:24.7639921Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_f.cu 2025-05-07T20:02:24.7641081Z 2025-05-07T20:02:24.7641350Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:24.7641704Z 2025-05-07T20:02:24.7642492Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:24.7643640Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:24.7644083Z ^ 2025-05-07T20:02:24.7644376Z detected during: 2025-05-07T20:02:24.7658809Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:24.7686285Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:24.7714346Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:24.7730189Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_f.cu 2025-05-07T20:02:24.7731327Z 2025-05-07T20:02:24.7731604Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:24.7731961Z 2025-05-07T20:02:24.7732767Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:24.7733926Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:24.7734394Z ^ 2025-05-07T20:02:24.7734688Z detected during: 2025-05-07T20:02:24.7749126Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:24.7776827Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:24.7804734Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:24.7820667Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_f.cu 2025-05-07T20:02:24.7821787Z 2025-05-07T20:02:24.7822062Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:24.7822425Z 2025-05-07T20:02:24.7823210Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:24.7824399Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:24.7824867Z ^ 2025-05-07T20:02:24.7825136Z detected during: 2025-05-07T20:02:24.7839575Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:24.7867136Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:24.7895002Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:24.7910969Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_f.cu 2025-05-07T20:02:24.7912110Z 2025-05-07T20:02:24.7912351Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:24.7912728Z 2025-05-07T20:02:24.7913582Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:24.7914709Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:24.7915179Z ^ 2025-05-07T20:02:24.7915469Z detected during: 2025-05-07T20:02:24.7929875Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:24.7957388Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:24.7985874Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:24.8001704Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_f.cu 2025-05-07T20:02:24.8002817Z 2025-05-07T20:02:24.8003085Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:24.8003442Z 2025-05-07T20:02:24.8004257Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:24.8005407Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:24.8005906Z ^ 2025-05-07T20:02:24.8006178Z detected during: 2025-05-07T20:02:24.8020744Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:24.8048262Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:24.8076207Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:24.8092029Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_f.cu 2025-05-07T20:02:24.8093141Z 2025-05-07T20:02:24.8093387Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:24.8093764Z 2025-05-07T20:02:28.2629760Z [147/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_f.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_f.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_f.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_f.cu.o 2025-05-07T20:02:28.2642145Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:02:28.2643661Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:28.2644765Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:28.2645219Z ^ 2025-05-07T20:02:28.2645388Z 2025-05-07T20:02:28.2645625Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:28.2645989Z 2025-05-07T20:02:28.2646772Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:28.2647897Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:02:28.2648325Z ^ 2025-05-07T20:02:28.2648513Z 2025-05-07T20:02:28.2649268Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:28.2650373Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:28.2650799Z ^ 2025-05-07T20:02:28.2651076Z detected during: 2025-05-07T20:02:28.2665375Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:28.2692454Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:28.2720399Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:28.2736832Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_f.cu 2025-05-07T20:02:28.2737981Z 2025-05-07T20:02:28.2738214Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:28.2738559Z 2025-05-07T20:02:28.2739345Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:28.2740435Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:28.2740891Z ^ 2025-05-07T20:02:28.2741177Z detected during: 2025-05-07T20:02:28.2755819Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:28.2782428Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:28.2810438Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:28.2826343Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_f.cu 2025-05-07T20:02:28.2827456Z 2025-05-07T20:02:28.2827700Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:28.2828074Z 2025-05-07T20:02:28.2828860Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:28.2830003Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:28.2830448Z ^ 2025-05-07T20:02:28.2830741Z detected during: 2025-05-07T20:02:28.2844977Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:28.2872049Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:28.2900065Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:28.2915985Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_f.cu 2025-05-07T20:02:28.2917128Z 2025-05-07T20:02:28.2917374Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:28.2917725Z 2025-05-07T20:02:28.2918514Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:28.2919653Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:28.2920112Z ^ 2025-05-07T20:02:28.2920379Z detected during: 2025-05-07T20:02:28.2934527Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:28.2961858Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:28.2992013Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:28.3007892Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_f.cu 2025-05-07T20:02:28.3009204Z 2025-05-07T20:02:28.3009448Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:28.3009859Z 2025-05-07T20:02:28.3010667Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:28.3011791Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:28.3012248Z ^ 2025-05-07T20:02:28.3012517Z detected during: 2025-05-07T20:02:28.3027207Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:28.3054801Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:28.3082588Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:28.3097940Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_f.cu 2025-05-07T20:02:28.3099019Z 2025-05-07T20:02:28.3099258Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:28.3099618Z 2025-05-07T20:02:28.3100386Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:28.3101501Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:28.3101942Z ^ 2025-05-07T20:02:28.3102225Z detected during: 2025-05-07T20:02:28.3116977Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:28.3144142Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:28.3171597Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:28.3187289Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=false]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_f.cu 2025-05-07T20:02:28.3188431Z 2025-05-07T20:02:28.3188671Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:28.3189022Z 2025-05-07T20:02:29.6274116Z [148/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_t.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_t.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_t.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_t.cu.o 2025-05-07T20:02:29.6286397Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:02:29.6288005Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:29.6289159Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:29.6289636Z ^ 2025-05-07T20:02:29.6289812Z 2025-05-07T20:02:29.6290069Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:29.6290446Z 2025-05-07T20:02:29.6291317Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:29.6292481Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:02:29.6292937Z ^ 2025-05-07T20:02:29.6293112Z 2025-05-07T20:02:29.6293917Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:29.6295032Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:29.6295495Z ^ 2025-05-07T20:02:29.6295766Z detected during: 2025-05-07T20:02:29.6310511Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:29.6337913Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:29.6365592Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:29.6381977Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=192, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_t.cu 2025-05-07T20:02:29.6383090Z 2025-05-07T20:02:29.6383447Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:29.6383838Z 2025-05-07T20:02:29.6384631Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:29.6385778Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:29.6386255Z ^ 2025-05-07T20:02:29.6386550Z detected during: 2025-05-07T20:02:29.6401371Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:29.6429389Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:29.6457372Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:29.6473244Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=192, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_t.cu 2025-05-07T20:02:29.6474384Z 2025-05-07T20:02:29.6474631Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:29.6475017Z 2025-05-07T20:02:29.6475804Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:29.6476928Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:29.6477408Z ^ 2025-05-07T20:02:29.6477705Z detected during: 2025-05-07T20:02:29.6492258Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:29.6520054Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:29.6548525Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:29.6564428Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=192, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_t.cu 2025-05-07T20:02:29.6565541Z 2025-05-07T20:02:29.6565814Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:29.6566203Z 2025-05-07T20:02:29.6567400Z ptxas /tmp/tmpxft_00007e73_00000000-9_f4f4bf16_128_192_2_2_1_t.compute_90.ptx, line 889; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:02:29.6569884Z ptxas /tmp/tmpxft_00007e73_00000000-9_f4f4bf16_128_192_2_2_1_t.compute_90.ptx, line 896; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:02:29.6572363Z ptxas /tmp/tmpxft_00007e73_00000000-9_f4f4bf16_128_192_2_2_1_t.compute_90.ptx, line 903; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:02:29.6574841Z ptxas /tmp/tmpxft_00007e73_00000000-9_f4f4bf16_128_192_2_2_1_t.compute_90.ptx, line 910; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:02:29.6577318Z ptxas /tmp/tmpxft_00007e73_00000000-9_f4f4bf16_128_192_2_2_1_t.compute_90.ptx, line 1044; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:02:29.6579777Z ptxas /tmp/tmpxft_00007e73_00000000-9_f4f4bf16_128_192_2_2_1_t.compute_90.ptx, line 1051; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:02:29.6582259Z ptxas /tmp/tmpxft_00007e73_00000000-9_f4f4bf16_128_192_2_2_1_t.compute_90.ptx, line 1058; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:02:29.6584864Z ptxas /tmp/tmpxft_00007e73_00000000-9_f4f4bf16_128_192_2_2_1_t.compute_90.ptx, line 1065; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:02:29.6586980Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:29.6588130Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:29.6588604Z ^ 2025-05-07T20:02:29.6588881Z detected during: 2025-05-07T20:02:29.6603294Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:29.6630911Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:29.6658878Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:29.6674683Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=192, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_t.cu 2025-05-07T20:02:29.6675796Z 2025-05-07T20:02:29.6676061Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:29.6676415Z 2025-05-07T20:02:29.6677203Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:29.6678344Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:29.6678794Z ^ 2025-05-07T20:02:29.6679085Z detected during: 2025-05-07T20:02:29.6693982Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:29.6721616Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:29.6749528Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:29.6765280Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=192, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_t.cu 2025-05-07T20:02:29.6766409Z 2025-05-07T20:02:29.6766681Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:29.6767041Z 2025-05-07T20:02:29.6767854Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:29.6768964Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:29.6769410Z ^ 2025-05-07T20:02:29.6769667Z detected during: 2025-05-07T20:02:29.6784182Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:29.6811572Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:29.6839421Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:29.6855233Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=192, TBS_M=2, TBS_N=2, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_t.cu 2025-05-07T20:02:29.6856332Z 2025-05-07T20:02:29.6856581Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:29.6856928Z 2025-05-07T20:02:30.7249480Z [149/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_t.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_t.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_t.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_t.cu.o 2025-05-07T20:02:30.7261679Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:02:30.7263433Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:30.7264559Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:30.7265003Z ^ 2025-05-07T20:02:30.7265169Z 2025-05-07T20:02:30.7265404Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:30.7265764Z 2025-05-07T20:02:30.7266557Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:30.7267688Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:02:30.7268109Z ^ 2025-05-07T20:02:30.7268271Z 2025-05-07T20:02:30.7269051Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:30.7270179Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:30.7270641Z ^ 2025-05-07T20:02:30.7270902Z detected during: 2025-05-07T20:02:30.7284652Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:30.7313505Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:30.7341425Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:30.7357246Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_t.cu 2025-05-07T20:02:30.7358334Z 2025-05-07T20:02:30.7358565Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:30.7358918Z 2025-05-07T20:02:30.7359673Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:30.7360759Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:30.7361174Z ^ 2025-05-07T20:02:30.7361427Z detected during: 2025-05-07T20:02:30.7374877Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:30.7401587Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:30.7429407Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:30.7444735Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_t.cu 2025-05-07T20:02:30.7445774Z 2025-05-07T20:02:30.7446004Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:30.7446324Z 2025-05-07T20:02:30.7447043Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:30.7448104Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:30.7448513Z ^ 2025-05-07T20:02:30.7448747Z detected during: 2025-05-07T20:02:30.7461957Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:30.7488601Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:30.7516425Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:30.7532183Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_t.cu 2025-05-07T20:02:30.7533294Z 2025-05-07T20:02:30.7533540Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:30.7533918Z 2025-05-07T20:02:30.7534703Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:30.7535841Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:30.7536284Z ^ 2025-05-07T20:02:30.7536578Z detected during: 2025-05-07T20:02:30.7550865Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:30.7577316Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:30.7605072Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:30.7621049Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_t.cu 2025-05-07T20:02:30.7622215Z 2025-05-07T20:02:30.7622446Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:30.7622782Z 2025-05-07T20:02:30.7623569Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:30.7624846Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:30.7625315Z ^ 2025-05-07T20:02:30.7625590Z detected during: 2025-05-07T20:02:30.7640236Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:30.7666965Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:30.7693558Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:30.7709505Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_t.cu 2025-05-07T20:02:30.7710653Z 2025-05-07T20:02:30.7710893Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:30.7711263Z 2025-05-07T20:02:30.7712049Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:30.7713188Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:30.7713632Z ^ 2025-05-07T20:02:30.7713905Z detected during: 2025-05-07T20:02:30.7728414Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:30.7755834Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:30.7783010Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma, cute::C<1>, cute::C<1>>>, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<192>, cute::C<128>>, cute::tuple, cute::Layout>>, cute::SM100_TMEM_LOAD_16dp256b8x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:30.7811164Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=256, TB_N=192, TBS_M=4, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_t.cu 2025-05-07T20:02:30.7812497Z 2025-05-07T20:02:30.7812799Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:30.7813160Z 2025-05-07T20:02:31.5422657Z [150/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_t.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_t.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_t.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_t.cu.o 2025-05-07T20:02:31.5435401Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:02:31.5437096Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:31.5438204Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:31.5438692Z ^ 2025-05-07T20:02:31.5438871Z 2025-05-07T20:02:31.5439118Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:31.5439505Z 2025-05-07T20:02:31.5440295Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:31.5441451Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:02:31.5441895Z ^ 2025-05-07T20:02:31.5442105Z 2025-05-07T20:02:31.5442874Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:31.5444008Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:31.5444452Z ^ 2025-05-07T20:02:31.5444761Z detected during: 2025-05-07T20:02:31.5458540Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:31.5484416Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:31.5510505Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:31.5525865Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_t.cu 2025-05-07T20:02:31.5526933Z 2025-05-07T20:02:31.5527197Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:31.5527533Z 2025-05-07T20:02:31.5528428Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:31.5529480Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:31.5529891Z ^ 2025-05-07T20:02:31.5530120Z detected during: 2025-05-07T20:02:31.5542482Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=4, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:02:31.5569494Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:31.5595290Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:31.5621941Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:31.5637698Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_t.cu 2025-05-07T20:02:31.5638735Z 2025-05-07T20:02:31.5639493Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:31.5640542Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:31.5640987Z ^ 2025-05-07T20:02:31.5641244Z detected during: 2025-05-07T20:02:31.5654262Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:31.5680222Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:31.5705688Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:31.5721518Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_t.cu 2025-05-07T20:02:31.5722545Z 2025-05-07T20:02:31.5722778Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:31.5723136Z 2025-05-07T20:02:31.5724028Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:31.5725075Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:31.5725454Z ^ 2025-05-07T20:02:31.5725676Z detected during: 2025-05-07T20:02:31.5737978Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=4, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:02:31.5764462Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:31.5789737Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:31.5816169Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:31.5832010Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_t.cu 2025-05-07T20:02:31.5833127Z 2025-05-07T20:02:31.5833904Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:31.5835025Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:31.5835454Z ^ 2025-05-07T20:02:31.5835753Z detected during: 2025-05-07T20:02:31.5848833Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:31.5874785Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:31.5900035Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:31.5915509Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_t.cu 2025-05-07T20:02:31.5916655Z 2025-05-07T20:02:31.5916902Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:31.5917264Z 2025-05-07T20:02:31.5918086Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:31.5919189Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:31.5919626Z ^ 2025-05-07T20:02:31.5919896Z detected during: 2025-05-07T20:02:31.5932501Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=4, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:02:31.5958837Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:31.5983399Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:31.6009997Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:31.6025821Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_t.cu 2025-05-07T20:02:31.6026931Z 2025-05-07T20:02:31.6028153Z ptxas /tmp/tmpxft_00007e7f_00000000-9_f4f4bf16_128_256_2_1_1_t.compute_90.ptx, line 835; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:02:31.6030655Z ptxas /tmp/tmpxft_00007e7f_00000000-9_f4f4bf16_128_256_2_1_1_t.compute_90.ptx, line 848; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:02:31.6033164Z ptxas /tmp/tmpxft_00007e7f_00000000-9_f4f4bf16_128_256_2_1_1_t.compute_90.ptx, line 988; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:02:31.6035673Z ptxas /tmp/tmpxft_00007e7f_00000000-9_f4f4bf16_128_256_2_1_1_t.compute_90.ptx, line 1001; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_90' as this feature is expected to have substantially reduced performance on some future architectures 2025-05-07T20:02:31.6037721Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:31.6038787Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:31.6039227Z ^ 2025-05-07T20:02:31.6039480Z detected during: 2025-05-07T20:02:31.6052443Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:31.6078332Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:31.6103428Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:31.6119104Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_t.cu 2025-05-07T20:02:31.6120346Z 2025-05-07T20:02:31.6120582Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:31.6120949Z 2025-05-07T20:02:31.6121798Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:31.6122837Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:31.6123221Z ^ 2025-05-07T20:02:31.6123514Z detected during: 2025-05-07T20:02:31.6135993Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=4, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:02:31.6162422Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:31.6187859Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:31.6214520Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:31.6230394Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_t.cu 2025-05-07T20:02:31.6231524Z 2025-05-07T20:02:31.6232311Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:31.6233456Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:31.6233901Z ^ 2025-05-07T20:02:31.6234190Z detected during: 2025-05-07T20:02:31.6248001Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:31.6274005Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:31.6299283Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:31.6314664Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_t.cu 2025-05-07T20:02:31.6315798Z 2025-05-07T20:02:31.6316043Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:31.6316400Z 2025-05-07T20:02:31.6317260Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:31.6318397Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:31.6318832Z ^ 2025-05-07T20:02:31.6319067Z detected during: 2025-05-07T20:02:31.6331883Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=4, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:02:31.6358223Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:31.6382846Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:31.6409425Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:31.6425178Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_t.cu 2025-05-07T20:02:31.6426279Z 2025-05-07T20:02:31.6427084Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:31.6428379Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:31.6428846Z ^ 2025-05-07T20:02:31.6429115Z detected during: 2025-05-07T20:02:31.6442726Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:31.6467853Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:31.6493834Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:31.6508787Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_t.cu 2025-05-07T20:02:31.6509899Z 2025-05-07T20:02:31.6510213Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:31.6510605Z 2025-05-07T20:02:31.6511395Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(729): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:31.6512518Z return observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:31.6512964Z ^ 2025-05-07T20:02:31.6513226Z detected during: 2025-05-07T20:02:31.6526349Z instantiation of "auto cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::load_init(const ProblemShape_MNKL &, cutlass::gemm::collective::CollectiveMma, TileShape_, ElementPairA_, StridePairA_, ElementPairB_, StridePairB_, TiledMma_, GmemTiledCopyPairA_, SmemLayoutAtomPairA_, SmemCopyAtomA_, TransformA_, GmemTiledCopyPairB_, SmemLayoutAtomPairB_, SmemCopyAtomB_, TransformB_>::TensorStorage &) const [with Stages=4, SchedulerPipelineStageCount=3, AccumulatorPipelineStageCount=1, ClusterShape=cute::tuple, TileShape_=cute::tuple, cute::C<256>, cute::C<128>>, ElementPairA_=cute::tuple, StridePairA_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, ElementPairB_=cute::tuple, StridePairB_=cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, TiledMma_=cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, GmemTiledCopyPairA_=cute::tuple, SmemLayoutAtomPairA_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, SmemCopyAtomA_=void, TransformA_=cute::identity, GmemTiledCopyPairB_=cute::tuple, SmemLayoutAtomPairB_=cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, SmemCopyAtomB_=void, TransformB_=cute::identity, ProblemShape_MNKL=cute::tuple]" at line 595 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/kernel/sm100_gemm_tma_warpspecialized.hpp 2025-05-07T20:02:31.6552305Z instantiation of "void cutlass::gemm::kernel::GemmUniversal>::operator()(const cutlass::gemm::kernel::GemmUniversal>::Params &, char *) [with ProblemShape_=cute::tuple, CollectiveMainloop_=cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, CollectiveEpilogue_=cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, TileSchedulerTag_=void]" at line 122 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/device_kernel.h 2025-05-07T20:02:31.6577273Z instantiation of "void cutlass::device_kernel(Operator::Params) [with Operator=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 340 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/device/gemm_universal_adapter.h 2025-05-07T20:02:31.6603587Z instantiation of "cutlass::Status cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::initialize(const cutlass::gemm::device::GemmUniversalAdapter, void>::value, void>>::Arguments &, void *, cudaStream_t, cutlass::CudaHostAdapter *) [with GemmKernel_=cutlass::gemm::kernel::GemmUniversal, cutlass::gemm::collective::CollectiveMma>, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::tuple, cute::tuple, int64_t>, cute::Layout, int32_t>, cute::tuple, int32_t>, cute::tuple>, cute::tuple, int32_t>, cute::tuple, cute::C<1>>, cute::_512>, cute::tuple, int32_t>>>>, cute::TiledMMA>, cute::Layout, cute::tuple, cute::C<0>, cute::C<0>>>, cute::tuple>, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<1>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<512>>>>>, void, cute::identity, cute::tuple, cute::tuple, cute::smem_ptr_flag_bits<4>, cute::Layout, cute::tuple>>, cute::Layout, cute::C<2>>, cute::tuple>, cute::_1, cute::tuple>, cute::tuple, cute::C<512>>, cute::tuple, cute::C<1>>>, cute::_0, cute::tuple, cute::C<1024>>>>>, void, cute::identity>, cutlass::epilogue::collective::CollectiveEpilogue, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::bfloat16_t, cute::tuple, int64_t, int64_t>, cutlass::epilogue::fusion::FusionCallbacks, cutlass::epilogue::fusion::LinearCombination, cute::tuple, cute::C<256>, cute::C<128>>, cute::tuple>, cute::SM100_TMEM_LOAD_16dp256b16x, cute::SM90_TMA_LOAD, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM75_U16x8_LDSM_T, cute::SM90_TMA_STORE, cute::ComposedLayout, cute::smem_ptr_flag_bits<16>, cute::Layout, cute::C<8>>, cute::tuple>>>, cute::SM90_U16x8_STSM_T, cute::AutoVectorizingCopyWithAssumedAlignment<128>>, void, void>]" at line 220 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_common.cuh 2025-05-07T20:02:31.6618572Z instantiation of "at::Tensor _f4f4bf16(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) [with TB_M=128, TB_N=256, TBS_M=2, TBS_N=1, TBS_K=1, USE_MX=true]" at line 22 of /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_t.cu 2025-05-07T20:02:31.6619660Z 2025-05-07T20:02:32.4029331Z [151/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/bf16i4bf16.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/bf16i4bf16.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/bf16i4bf16.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/bf16i4bf16.cu.o 2025-05-07T20:02:32.4041088Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:02:32.4042527Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:32.4043652Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:02:32.4044067Z ^ 2025-05-07T20:02:32.4044256Z 2025-05-07T20:02:32.4044489Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:02:32.4044821Z 2025-05-07T20:02:32.4045584Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:02:32.4046632Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:02:32.4047062Z ^ 2025-05-07T20:02:32.4047221Z 2025-05-07T20:03:17.4955681Z [152/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8i4bf16_shuffled.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8i4bf16_shuffled.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8i4bf16_shuffled.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8i4bf16_shuffled.cu.o 2025-05-07T20:03:17.4967720Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:03:17.4969302Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:03:17.4970439Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:03:17.4970915Z ^ 2025-05-07T20:03:17.4971092Z 2025-05-07T20:03:17.4971423Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:03:17.4971779Z 2025-05-07T20:03:17.4972584Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:03:17.4973758Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:03:17.4974208Z ^ 2025-05-07T20:03:17.4974406Z 2025-05-07T20:03:20.1722480Z [153/156] /github/home/miniconda/envs/build_binary/bin/nvcc -forward-unknown-to-host-compiler -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -Dfbgemm_gpu_experimental_gen_ai_EXPORTS -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/asmjit/src -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cpuinfo/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/tools/util/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/composable_kernel/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/json/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -I/__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include -isystem /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /github/home/miniconda/envs/build_binary/targets/x86_64-linux/include -DONNX_NAMESPACE=onnx_c2 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90a,code=sm_90a -gencode arch=compute_100a,code=sm_100a -gencode arch=compute_120a,code=sm_120a -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O3 -DNDEBUG -std=c++20 -Xcompiler=-fPIC -Wno-deprecated-anon-enum-enum-conversion -Wno-deprecated-declarations -MD -MT experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8i4bf16_shuffled_grouped.cu.o -MF experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8i4bf16_shuffled_grouped.cu.o.d -x cu -c /__w/FBGEMM/FBGEMM/fbgemm_gpu/experimental/gen_ai/src/quantize/cutlass_extensions/f8i4bf16_shuffled_grouped.cu -o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8i4bf16_shuffled_grouped.cu.o 2025-05-07T20:03:20.1734304Z nvcc warning : Support for offline compilation for architectures prior to '_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). 2025-05-07T20:03:20.1735777Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_warpspecialized.hpp(719): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:03:20.1736825Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB_)); 2025-05-07T20:03:20.1737230Z ^ 2025-05-07T20:03:20.1737390Z 2025-05-07T20:03:20.1737632Z Remark: The warnings can be suppressed with "-diag-suppress " 2025-05-07T20:03:20.1737960Z 2025-05-07T20:03:20.1738940Z /__w/FBGEMM/FBGEMM/fbgemm_gpu/../external/cutlass/include/cutlass/gemm/collective/sm100_blockscaled_mma_array_warpspecialized.hpp(684): warning #2908-D: the implicit by-copy capture of "this" is deprecated 2025-05-07T20:03:20.1740058Z Tensor mSFB_tmp = observed_tma_load_sfb_->get_tma_tensor(shape(layout_SFB)); 2025-05-07T20:03:20.1740495Z ^ 2025-05-07T20:03:20.1740675Z 2025-05-07T20:03:20.7655524Z [154/156] : && /github/home/miniconda/envs/build_binary/bin/x86_64-conda-linux-gnu-c++ -fPIC -DTORCH_USE_CUDA_DSA -DTORCH_USE_HIP_DSA -L/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/lib -O3 -DNDEBUG -Wl,-O2 -Wl,--sort-common -Wl,--as-needed -Wl,-z,relro -Wl,-z,now -Wl,--disable-new-dtags -Wl,--gc-sections -Wl,--allow-shlib-undefined -Wl,-rpath,/github/home/miniconda/envs/build_binary/lib -Wl,-rpath-link,/github/home/miniconda/envs/build_binary/lib -L/github/home/miniconda/envs/build_binary/lib -L/github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib -L/github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/stubs -s -shared -Wl,-soname,fbgemm_gpu_experimental_gen_ai.so -o experimental/gen_ai/fbgemm_gpu_experimental_gen_ai.so experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/attention/attention.cpp.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/coalesce/coalesce.cpp.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/quantize.cpp.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/comm/car.cpp.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/gather_scatter/gather_scatter.cpp.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/moe/index_shuffling.cpp.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/kv_cache/kv_cache.cpp.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/attention/gqa_attn_splitk.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/coalesce/coalesce.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/quantize.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/comm/car.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/gather_scatter/gather_scatter.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/moe/index_shuffling.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/kv_cache/kv_cache.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/bf16bf16bf16_grouped.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/bf16i4bf16.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/bf16i4bf16_rowwise_batched.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/bf16i4bf16_shuffled_grouped.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_f.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_128_4_1_1_t.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_f.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_192_2_2_1_t.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_f.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_128_256_2_1_1_t.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_f.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_2_1_t.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_f.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_128_2_4_1_t.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_f.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_2_1_t.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_f.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_2_4_1_t.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_f.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_192_4_1_1_t.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_f.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_1_1_t.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_f.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_2_1_t.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_f.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_2_4_1_t.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_f.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f4f4bf16/f4f4bf16_256_256_4_1_1_t.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_blockwise.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_cublas.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_lite.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_128_128_128_2_1_1_t_f.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_128_256_128_2_1_1_f_t.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_128_256_128_4_4_1_f_t.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_128_128_1_1_1_f_f.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_16_128_1_1_1_f_f.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_256_128_1_1_1_f_f.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_256_128_2_1_1_f_f.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_32_128_2_1_1_f_f.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise/f8f8bf16_rowwise_64_64_128_2_1_1_f_f.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/dispatch_fp8_rowwise_batched_kernel_on_cluster_size_and_transpose.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/dispatch_fp8_rowwise_batched_kernel_on_tile_size.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/f8f8bf16_rowwise_batched.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/f8f8bf16_rowwise_batched_impl.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched/handle_transposition.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_rowwise_grouped.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8f8bf16_tensorwise.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8i4bf16_rowwise.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8i4bf16_shuffled.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/f8i4bf16_shuffled_grouped.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/i8i8bf16.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/i8i8bf16_dynamic.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/cutlass_extensions/mixed_dtype_utils.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/fast_gemv/bf16_fast_gemv.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/fast_gemv/bf16fp8bf16_fast_gemv.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/fast_gemv/fp8fp8bf16_fast_gemv.cu.o experimental/gen_ai/CMakeFiles/fbgemm_gpu_experimental_gen_ai.dir/src/quantize/fast_gemv/include/fast_gemv.cu.o -L/lib/intel64 -L/lib/intel64_win -L/lib/win-x64 -Wl,-rpath,/lib/intel64:/lib/intel64_win:/lib/win-x64:/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/lib:/github/home/miniconda/envs/build_binary/lib/stubs: /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/lib/libtorch.so /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/lib/libc10.so /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/libnvrtc.so /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/lib/libc10_cuda.so /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/nvidia/nccl/lib/libnccl.so.2 /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/stubs/libcuda.so /github/home/miniconda/envs/build_binary/lib/stubs/libnvidia-ml.so -Wl,--no-as-needed,"/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/lib/libtorch_cpu.so" -Wl,--as-needed -Wl,--no-as-needed,"/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/lib/libtorch_cuda.so" -Wl,--as-needed /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/lib/libc10_cuda.so /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/lib/libc10.so /github/home/miniconda/envs/build_binary/targets/x86_64-linux/lib/libcudart.so -Wl,--no-as-needed,"/github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/torch/lib/libtorch.so" -Wl,--as-needed -lcudadevrt -lcudart_static -lrt -lpthread -ldl && : 2025-05-07T20:03:20.9898752Z [155/156] cd /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.12/cmake-build/experimental/gen_ai && bash /__w/FBGEMM/FBGEMM/fbgemm_gpu/../.github/scripts/fbgemm_gpu_postbuild.bash /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.12/cmake-build/experimental/gen_ai/fbgemm_gpu_experimental_gen_ai.so 2025-05-07T20:03:20.9900204Z ################################################################################ 2025-05-07T20:03:20.9900565Z [CMAKE] Running post-build script ... 2025-05-07T20:03:20.9901243Z Target file: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.12/cmake-build/experimental/gen_ai/fbgemm_gpu_experimental_gen_ai.so 2025-05-07T20:03:20.9901939Z Removing all RPATHs ... 2025-05-07T20:03:20.9902223Z ################################################################################ 2025-05-07T20:03:20.9903292Z [155/156] cd /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.12/cmake-build && /github/home/miniconda/envs/build_binary/lib/python3.12/site-packages/cmake/data/bin/cmake -P cmake_install.cmake 2025-05-07T20:03:21.0825063Z -- Install configuration: "Release" 2025-05-07T20:03:21.0853667Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/asmjit.so 2025-05-07T20:03:21.0898483Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/fbgemm.so 2025-05-07T20:03:21.0940603Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/experimental/gen_ai/fbgemm_gpu_experimental_gen_ai.so 2025-05-07T20:03:21.0961059Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/experimental/bench 2025-05-07T20:03:21.0980369Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/experimental/bench/__init__.py 2025-05-07T20:03:21.0991144Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/experimental/bench/ck_bf16_bench.py 2025-05-07T20:03:21.0994438Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/experimental/bench/comm_bench.py 2025-05-07T20:03:21.0995556Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/experimental/bench/gather_scatter_bench.py 2025-05-07T20:03:21.0996624Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/experimental/bench/quantize_bench.py 2025-05-07T20:03:21.0997626Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/experimental/bench/quantize_ops.py 2025-05-07T20:03:21.1001995Z -- Up-to-date: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/experimental/gen_ai 2025-05-07T20:03:21.1006495Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/experimental/gen_ai/moe 2025-05-07T20:03:21.1028976Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/experimental/gen_ai/moe/README.md 2025-05-07T20:03:21.1034024Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/experimental/gen_ai/moe/__init__.py 2025-05-07T20:03:21.1035133Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/experimental/gen_ai/moe/activation.py 2025-05-07T20:03:21.1036481Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/experimental/gen_ai/moe/gather_scatter.py 2025-05-07T20:03:21.1037507Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/experimental/gen_ai/moe/layers.py 2025-05-07T20:03:21.1038528Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/experimental/gen_ai/moe/shuffling.py 2025-05-07T20:03:21.1041804Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/experimental/gen_ai/__init__.py 2025-05-07T20:03:21.1049395Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/experimental/gen_ai/quantize.py 2025-05-07T20:03:21.1076986Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/experimental/example/fbgemm_gpu_experimental_example_py.so 2025-05-07T20:03:21.1106387Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/experimental/example/__init__.py 2025-05-07T20:03:21.1110033Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/experimental/example/utils.py 2025-05-07T20:03:21.1162000Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/experimental/gemm/triton_gemm/__init__.py 2025-05-07T20:03:21.1163462Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/experimental/gemm/triton_gemm/fp8_gemm.py 2025-05-07T20:03:21.1164690Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/experimental/gemm/triton_gemm/grouped_gemm.py 2025-05-07T20:03:21.1165829Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/experimental/gemm/triton_gemm/matmul_perf_model.py 2025-05-07T20:03:21.1166942Z -- Installing: /__w/FBGEMM/FBGEMM/fbgemm_gpu/_skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/experimental/gemm/triton_gemm/utils.py 2025-05-07T20:03:21.1443711Z 2025-05-07T20:03:21.4831610Z 2025-05-07T20:03:21.4848868Z copying fbgemm_gpu/__init__.py -> _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/__init__.py 2025-05-07T20:03:21.4966975Z copying fbgemm_gpu/batched_unary_embeddings_ops.py -> _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/batched_unary_embeddings_ops.py 2025-05-07T20:03:21.4969624Z copying fbgemm_gpu/enums.py -> _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/enums.py 2025-05-07T20:03:21.4974112Z copying fbgemm_gpu/metrics.py -> _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/metrics.py 2025-05-07T20:03:21.4983356Z copying fbgemm_gpu/permute_pooled_embedding_modules.py -> _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/permute_pooled_embedding_modules.py 2025-05-07T20:03:21.5005664Z copying fbgemm_gpu/permute_pooled_embedding_modules_split.py -> _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/permute_pooled_embedding_modules_split.py 2025-05-07T20:03:21.5009262Z copying fbgemm_gpu/quantize_comm.py -> _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/quantize_comm.py 2025-05-07T20:03:21.5015142Z copying fbgemm_gpu/quantize_utils.py -> _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/quantize_utils.py 2025-05-07T20:03:21.5025198Z copying fbgemm_gpu/runtime_monitor.py -> _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/runtime_monitor.py 2025-05-07T20:03:21.5030001Z copying fbgemm_gpu/sparse_ops.py -> _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/sparse_ops.py 2025-05-07T20:03:21.5050339Z copying fbgemm_gpu/split_embedding_configs.py -> _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/split_embedding_configs.py 2025-05-07T20:03:21.5054260Z copying fbgemm_gpu/split_embedding_inference_converter.py -> _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/split_embedding_inference_converter.py 2025-05-07T20:03:21.5060826Z copying fbgemm_gpu/split_embedding_optimizer_ops.py -> _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/split_embedding_optimizer_ops.py 2025-05-07T20:03:21.5064554Z copying fbgemm_gpu/split_embedding_utils.py -> _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/split_embedding_utils.py 2025-05-07T20:03:21.5072025Z copying fbgemm_gpu/split_table_batched_embeddings_ops.py -> _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/split_table_batched_embeddings_ops.py 2025-05-07T20:03:21.5080597Z copying fbgemm_gpu/split_table_batched_embeddings_ops_common.py -> _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/split_table_batched_embeddings_ops_common.py 2025-05-07T20:03:21.5088465Z copying fbgemm_gpu/split_table_batched_embeddings_ops_inference.py -> _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/split_table_batched_embeddings_ops_inference.py 2025-05-07T20:03:21.5108239Z copying fbgemm_gpu/split_table_batched_embeddings_ops_training.py -> _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/split_table_batched_embeddings_ops_training.py 2025-05-07T20:03:21.5135626Z copying fbgemm_gpu/split_table_batched_embeddings_ops_training_common.py -> _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/split_table_batched_embeddings_ops_training_common.py 2025-05-07T20:03:21.5145459Z copying fbgemm_gpu/ssd_split_table_batched_embeddings_ops.py -> _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/ssd_split_table_batched_embeddings_ops.py 2025-05-07T20:03:21.5160813Z copying fbgemm_gpu/tbe_input_multiplexer.py -> _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/tbe_input_multiplexer.py 2025-05-07T20:03:21.5166485Z copying fbgemm_gpu/uvm.py -> _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/uvm.py 2025-05-07T20:03:21.5174227Z creating directory _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/config 2025-05-07T20:03:21.5204917Z copying fbgemm_gpu/config/__init__.py -> _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/config/__init__.py 2025-05-07T20:03:21.5215073Z copying fbgemm_gpu/config/feature_list.py -> _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/config/feature_list.py 2025-05-07T20:03:21.5220377Z creating directory _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/docs 2025-05-07T20:03:21.5247751Z copying fbgemm_gpu/docs/__init__.py -> _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/docs/__init__.py 2025-05-07T20:03:21.5251728Z copying fbgemm_gpu/docs/common.py -> _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/docs/common.py 2025-05-07T20:03:21.5257916Z copying fbgemm_gpu/docs/examples.py -> _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/docs/examples.py 2025-05-07T20:03:21.5264761Z copying fbgemm_gpu/docs/jagged_tensor_ops.py -> _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/docs/jagged_tensor_ops.py 2025-05-07T20:03:21.5273741Z copying fbgemm_gpu/docs/merge_pooled_embedding_ops.py -> _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/docs/merge_pooled_embedding_ops.py 2025-05-07T20:03:21.5281969Z copying fbgemm_gpu/docs/permute_pooled_embedding_ops.py -> _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/docs/permute_pooled_embedding_ops.py 2025-05-07T20:03:21.5289573Z copying fbgemm_gpu/docs/quantize_ops.py -> _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/docs/quantize_ops.py 2025-05-07T20:03:21.5297822Z copying fbgemm_gpu/docs/sparse_ops.py -> _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/docs/sparse_ops.py 2025-05-07T20:03:21.5314277Z copying fbgemm_gpu/docs/version.py -> _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/docs/version.py 2025-05-07T20:03:21.5324416Z creating directory _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/quantize 2025-05-07T20:03:21.5332360Z copying fbgemm_gpu/quantize/__init__.py -> _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/quantize/__init__.py 2025-05-07T20:03:21.5338708Z copying fbgemm_gpu/quantize/quantize_ops.py -> _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/quantize/quantize_ops.py 2025-05-07T20:03:21.5346252Z creating directory _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/sll 2025-05-07T20:03:21.5348350Z copying fbgemm_gpu/sll/__init__.py -> _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/sll/__init__.py 2025-05-07T20:03:21.5355542Z creating directory _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/tbe 2025-05-07T20:03:21.5372995Z copying fbgemm_gpu/tbe/__init__.py -> _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/tbe/__init__.py 2025-05-07T20:03:21.5374300Z creating directory _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/triton 2025-05-07T20:03:21.5375045Z copying fbgemm_gpu/triton/__init__.py -> _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/triton/__init__.py 2025-05-07T20:03:21.5392436Z copying fbgemm_gpu/triton/common.py -> _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/triton/common.py 2025-05-07T20:03:21.5399702Z copying fbgemm_gpu/triton/quantize.py -> _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/triton/quantize.py 2025-05-07T20:03:21.5408453Z copying fbgemm_gpu/triton/quantize_ref.py -> _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/triton/quantize_ref.py 2025-05-07T20:03:21.5414315Z creating directory _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/utils 2025-05-07T20:03:21.5442824Z copying fbgemm_gpu/utils/__init__.py -> _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/utils/__init__.py 2025-05-07T20:03:21.5445438Z copying fbgemm_gpu/utils/filestore.py -> _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/utils/filestore.py 2025-05-07T20:03:21.5453794Z copying fbgemm_gpu/utils/loader.py -> _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/utils/loader.py 2025-05-07T20:03:21.5460190Z copying fbgemm_gpu/utils/torch_library.py -> _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/utils/torch_library.py 2025-05-07T20:03:21.5465741Z creating directory _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/sll/cpu 2025-05-07T20:03:21.5466488Z copying fbgemm_gpu/sll/cpu/__init__.py -> _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/sll/cpu/__init__.py 2025-05-07T20:03:21.5473630Z copying fbgemm_gpu/sll/cpu/cpu_sll.py -> _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/sll/cpu/cpu_sll.py 2025-05-07T20:03:21.5483182Z creating directory _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/sll/meta 2025-05-07T20:03:21.5485325Z copying fbgemm_gpu/sll/meta/__init__.py -> _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/sll/meta/__init__.py 2025-05-07T20:03:21.5492410Z copying fbgemm_gpu/sll/meta/meta_sll.py -> _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/sll/meta/meta_sll.py 2025-05-07T20:03:21.5501218Z creating directory _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/sll/triton 2025-05-07T20:03:21.5503575Z copying fbgemm_gpu/sll/triton/__init__.py -> _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/sll/triton/__init__.py 2025-05-07T20:03:21.5520631Z copying fbgemm_gpu/sll/triton/common.py -> _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/sll/triton/common.py 2025-05-07T20:03:21.5537221Z copying fbgemm_gpu/sll/triton/triton_dense_jagged_cat_jagged_out.py -> _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/sll/triton/triton_dense_jagged_cat_jagged_out.py 2025-05-07T20:03:21.5540593Z copying fbgemm_gpu/sll/triton/triton_jagged2_to_padded_dense.py -> _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/sll/triton/triton_jagged2_to_padded_dense.py 2025-05-07T20:03:21.5547196Z copying fbgemm_gpu/sll/triton/triton_jagged_bmm.py -> _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/sll/triton/triton_jagged_bmm.py 2025-05-07T20:03:21.5559330Z copying fbgemm_gpu/sll/triton/triton_jagged_bmm_jagged_out.py -> _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/sll/triton/triton_jagged_bmm_jagged_out.py 2025-05-07T20:03:21.5578807Z copying fbgemm_gpu/sll/triton/triton_jagged_dense_elementwise_add.py -> _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/sll/triton/triton_jagged_dense_elementwise_add.py 2025-05-07T20:03:21.5582188Z copying fbgemm_gpu/sll/triton/triton_jagged_dense_elementwise_mul_jagged_out.py -> _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/sll/triton/triton_jagged_dense_elementwise_mul_jagged_out.py 2025-05-07T20:03:21.5587081Z copying fbgemm_gpu/sll/triton/triton_jagged_dense_flash_attention.py -> _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/sll/triton/triton_jagged_dense_flash_attention.py 2025-05-07T20:03:21.5598550Z copying fbgemm_gpu/sll/triton/triton_jagged_flash_attention_basic.py -> _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/sll/triton/triton_jagged_flash_attention_basic.py 2025-05-07T20:03:21.5607684Z copying fbgemm_gpu/sll/triton/triton_jagged_self_substraction_jagged_out.py -> _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/sll/triton/triton_jagged_self_substraction_jagged_out.py 2025-05-07T20:03:21.5621025Z copying fbgemm_gpu/sll/triton/triton_jagged_softmax.py -> _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/sll/triton/triton_jagged_softmax.py 2025-05-07T20:03:21.5628177Z copying fbgemm_gpu/sll/triton/triton_multi_head_jagged_flash_attention.py -> _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/sll/triton/triton_multi_head_jagged_flash_attention.py 2025-05-07T20:03:21.5638112Z creating directory _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/tbe/bench 2025-05-07T20:03:21.5639060Z copying fbgemm_gpu/tbe/bench/__init__.py -> _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/tbe/bench/__init__.py 2025-05-07T20:03:21.5647023Z copying fbgemm_gpu/tbe/bench/bench_config.py -> _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/tbe/bench/bench_config.py 2025-05-07T20:03:21.5655608Z copying fbgemm_gpu/tbe/bench/bench_runs.py -> _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/tbe/bench/bench_runs.py 2025-05-07T20:03:21.5661892Z copying fbgemm_gpu/tbe/bench/eeg_cli.py -> _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/tbe/bench/eeg_cli.py 2025-05-07T20:03:21.5670080Z copying fbgemm_gpu/tbe/bench/embedding_ops_common_config.py -> _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/tbe/bench/embedding_ops_common_config.py 2025-05-07T20:03:21.5689839Z copying fbgemm_gpu/tbe/bench/eval_compression.py -> _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/tbe/bench/eval_compression.py 2025-05-07T20:03:21.5690887Z copying fbgemm_gpu/tbe/bench/reporter.py -> _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/tbe/bench/reporter.py 2025-05-07T20:03:21.5699682Z copying fbgemm_gpu/tbe/bench/tbe_data_config.py -> _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/tbe/bench/tbe_data_config.py 2025-05-07T20:03:21.5708593Z copying fbgemm_gpu/tbe/bench/tbe_data_config_loader.py -> _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/tbe/bench/tbe_data_config_loader.py 2025-05-07T20:03:21.5717399Z copying fbgemm_gpu/tbe/bench/tbe_data_config_param_models.py -> _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/tbe/bench/tbe_data_config_param_models.py 2025-05-07T20:03:21.5722419Z copying fbgemm_gpu/tbe/bench/utils.py -> _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/tbe/bench/utils.py 2025-05-07T20:03:21.5730659Z creating directory _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/tbe/cache 2025-05-07T20:03:21.5732874Z copying fbgemm_gpu/tbe/cache/__init__.py -> _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/tbe/cache/__init__.py 2025-05-07T20:03:21.5741770Z copying fbgemm_gpu/tbe/cache/split_embeddings_cache_ops.py -> _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/tbe/cache/split_embeddings_cache_ops.py 2025-05-07T20:03:21.5745822Z creating directory _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/tbe/ssd 2025-05-07T20:03:21.5747956Z copying fbgemm_gpu/tbe/ssd/__init__.py -> _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/tbe/ssd/__init__.py 2025-05-07T20:03:21.5762278Z copying fbgemm_gpu/tbe/ssd/common.py -> _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/tbe/ssd/common.py 2025-05-07T20:03:21.5765398Z copying fbgemm_gpu/tbe/ssd/inference.py -> _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/tbe/ssd/inference.py 2025-05-07T20:03:21.5773189Z copying fbgemm_gpu/tbe/ssd/training.py -> _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/tbe/ssd/training.py 2025-05-07T20:03:21.5786279Z creating directory _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/tbe/stats 2025-05-07T20:03:21.5788634Z copying fbgemm_gpu/tbe/stats/__init__.py -> _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/tbe/stats/__init__.py 2025-05-07T20:03:21.5796336Z copying fbgemm_gpu/tbe/stats/bench_params_reporter.py -> _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/tbe/stats/bench_params_reporter.py 2025-05-07T20:03:21.5803236Z creating directory _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/tbe/utils 2025-05-07T20:03:21.5805400Z copying fbgemm_gpu/tbe/utils/__init__.py -> _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/tbe/utils/__init__.py 2025-05-07T20:03:21.5808463Z copying fbgemm_gpu/tbe/utils/common.py -> _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/tbe/utils/common.py 2025-05-07T20:03:21.5816588Z copying fbgemm_gpu/tbe/utils/offsets.py -> _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/tbe/utils/offsets.py 2025-05-07T20:03:21.5822995Z copying fbgemm_gpu/tbe/utils/quantize.py -> _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/tbe/utils/quantize.py 2025-05-07T20:03:21.5832097Z copying fbgemm_gpu/tbe/utils/requests.py -> _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/tbe/utils/requests.py 2025-05-07T20:03:21.5841778Z creating directory _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/tbe/ssd/utils 2025-05-07T20:03:21.5844108Z copying fbgemm_gpu/tbe/ssd/utils/__init__.py -> _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/tbe/ssd/utils/__init__.py 2025-05-07T20:03:21.5853292Z copying fbgemm_gpu/tbe/ssd/utils/partially_materialized_tensor.py -> _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/tbe/ssd/utils/partially_materialized_tensor.py 2025-05-07T20:03:21.5861632Z creating directory _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/triton/jagged 2025-05-07T20:03:21.5862662Z copying fbgemm_gpu/triton/jagged/__init__.py -> _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/triton/jagged/__init__.py 2025-05-07T20:03:21.5867281Z copying fbgemm_gpu/triton/jagged/triton_jagged_tensor_ops.py -> _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/triton/jagged/triton_jagged_tensor_ops.py 2025-05-07T20:03:21.6012381Z 2025-05-07T20:03:22.1817049Z INFO:root:running bdist_wheel 2025-05-07T20:03:22.3203115Z INFO:root:running build 2025-05-07T20:03:22.3203962Z INFO:root:running build_py 2025-05-07T20:03:22.3500186Z INFO:root:creating _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu 2025-05-07T20:03:22.3561911Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/__init__.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu 2025-05-07T20:03:22.3565968Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/batched_unary_embeddings_ops.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu 2025-05-07T20:03:22.3568804Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/enums.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu 2025-05-07T20:03:22.3570026Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/metrics.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu 2025-05-07T20:03:22.3571405Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/permute_pooled_embedding_modules.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu 2025-05-07T20:03:22.3573085Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/permute_pooled_embedding_modules_split.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu 2025-05-07T20:03:22.3574555Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/quantize_comm.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu 2025-05-07T20:03:22.3575880Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/quantize_utils.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu 2025-05-07T20:03:22.3577202Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/runtime_monitor.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu 2025-05-07T20:03:22.3578565Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/sparse_ops.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu 2025-05-07T20:03:22.3579929Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/split_embedding_configs.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu 2025-05-07T20:03:22.3581442Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/split_embedding_inference_converter.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu 2025-05-07T20:03:22.3582910Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/split_embedding_optimizer_ops.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu 2025-05-07T20:03:22.3584427Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/split_embedding_utils.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu 2025-05-07T20:03:22.3585877Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/split_table_batched_embeddings_ops.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu 2025-05-07T20:03:22.3587411Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/split_table_batched_embeddings_ops_common.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu 2025-05-07T20:03:22.3588935Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/split_table_batched_embeddings_ops_inference.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu 2025-05-07T20:03:22.3592546Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/split_table_batched_embeddings_ops_training.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu 2025-05-07T20:03:22.3608421Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/split_table_batched_embeddings_ops_training_common.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu 2025-05-07T20:03:22.3611651Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/ssd_split_table_batched_embeddings_ops.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu 2025-05-07T20:03:22.3613097Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/tbe_input_multiplexer.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu 2025-05-07T20:03:22.3614373Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/uvm.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu 2025-05-07T20:03:22.3620674Z INFO:root:creating _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/config 2025-05-07T20:03:22.3623471Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/config/__init__.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/config 2025-05-07T20:03:22.3638086Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/config/feature_list.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/config 2025-05-07T20:03:22.3639225Z INFO:root:creating _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/docs 2025-05-07T20:03:22.3658885Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/docs/__init__.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/docs 2025-05-07T20:03:22.3662608Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/docs/common.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/docs 2025-05-07T20:03:22.3664229Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/docs/examples.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/docs 2025-05-07T20:03:22.3665685Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/docs/jagged_tensor_ops.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/docs 2025-05-07T20:03:22.3667415Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/docs/merge_pooled_embedding_ops.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/docs 2025-05-07T20:03:22.3677718Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/docs/permute_pooled_embedding_ops.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/docs 2025-05-07T20:03:22.3681917Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/docs/quantize_ops.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/docs 2025-05-07T20:03:22.3683723Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/docs/sparse_ops.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/docs 2025-05-07T20:03:22.3687168Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/docs/version.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/docs 2025-05-07T20:03:22.3694346Z INFO:root:creating _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/quantize 2025-05-07T20:03:22.3702644Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/quantize/__init__.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/quantize 2025-05-07T20:03:22.3706939Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/quantize/quantize_ops.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/quantize 2025-05-07T20:03:22.3713809Z INFO:root:creating _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/sll 2025-05-07T20:03:22.3716820Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/sll/__init__.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/sll 2025-05-07T20:03:22.3719781Z INFO:root:creating _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/tbe 2025-05-07T20:03:22.3720790Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/tbe/__init__.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/tbe 2025-05-07T20:03:22.3721836Z INFO:root:creating _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/triton 2025-05-07T20:03:22.3722888Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/triton/__init__.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/triton 2025-05-07T20:03:22.3724232Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/triton/common.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/triton 2025-05-07T20:03:22.3725595Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/triton/quantize.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/triton 2025-05-07T20:03:22.3727110Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/triton/quantize_ref.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/triton 2025-05-07T20:03:22.3736438Z INFO:root:creating _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/utils 2025-05-07T20:03:22.3739041Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/utils/__init__.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/utils 2025-05-07T20:03:22.3742405Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/utils/filestore.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/utils 2025-05-07T20:03:22.3747992Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/utils/loader.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/utils 2025-05-07T20:03:22.3758168Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/utils/torch_library.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/utils 2025-05-07T20:03:22.3776811Z INFO:root:creating _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/sll/cpu 2025-05-07T20:03:22.3780213Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/sll/cpu/__init__.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/sll/cpu 2025-05-07T20:03:22.3782917Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/sll/cpu/cpu_sll.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/sll/cpu 2025-05-07T20:03:22.3784224Z INFO:root:creating _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/sll/meta 2025-05-07T20:03:22.3785368Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/sll/meta/__init__.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/sll/meta 2025-05-07T20:03:22.3786757Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/sll/meta/meta_sll.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/sll/meta 2025-05-07T20:03:22.3787908Z INFO:root:creating _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/sll/triton 2025-05-07T20:03:22.3789067Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/sll/triton/__init__.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/sll/triton 2025-05-07T20:03:22.3790495Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/sll/triton/common.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/sll/triton 2025-05-07T20:03:22.3792061Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/sll/triton/triton_dense_jagged_cat_jagged_out.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/sll/triton 2025-05-07T20:03:22.3793725Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/sll/triton/triton_jagged2_to_padded_dense.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/sll/triton 2025-05-07T20:03:22.3795530Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/sll/triton/triton_jagged_bmm.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/sll/triton 2025-05-07T20:03:22.3797110Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/sll/triton/triton_jagged_bmm_jagged_out.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/sll/triton 2025-05-07T20:03:22.3817630Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/sll/triton/triton_jagged_dense_elementwise_add.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/sll/triton 2025-05-07T20:03:22.3819381Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/sll/triton/triton_jagged_dense_elementwise_mul_jagged_out.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/sll/triton 2025-05-07T20:03:22.3821182Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/sll/triton/triton_jagged_dense_flash_attention.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/sll/triton 2025-05-07T20:03:22.3822956Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/sll/triton/triton_jagged_flash_attention_basic.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/sll/triton 2025-05-07T20:03:22.3824741Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/sll/triton/triton_jagged_self_substraction_jagged_out.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/sll/triton 2025-05-07T20:03:22.3826430Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/sll/triton/triton_jagged_softmax.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/sll/triton 2025-05-07T20:03:22.3828074Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/sll/triton/triton_multi_head_jagged_flash_attention.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/sll/triton 2025-05-07T20:03:22.3829377Z INFO:root:creating _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/tbe/bench 2025-05-07T20:03:22.3830566Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/tbe/bench/__init__.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/tbe/bench 2025-05-07T20:03:22.3832020Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/tbe/bench/bench_config.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/tbe/bench 2025-05-07T20:03:22.3833512Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/tbe/bench/bench_runs.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/tbe/bench 2025-05-07T20:03:22.3834927Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/tbe/bench/eeg_cli.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/tbe/bench 2025-05-07T20:03:22.3844025Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/tbe/bench/embedding_ops_common_config.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/tbe/bench 2025-05-07T20:03:22.3848535Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/tbe/bench/eval_compression.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/tbe/bench 2025-05-07T20:03:22.3851189Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/tbe/bench/reporter.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/tbe/bench 2025-05-07T20:03:22.3852658Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/tbe/bench/tbe_data_config.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/tbe/bench 2025-05-07T20:03:22.3854174Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/tbe/bench/tbe_data_config_loader.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/tbe/bench 2025-05-07T20:03:22.3855732Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/tbe/bench/tbe_data_config_param_models.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/tbe/bench 2025-05-07T20:03:22.3857194Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/tbe/bench/utils.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/tbe/bench 2025-05-07T20:03:22.3858332Z INFO:root:creating _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/tbe/cache 2025-05-07T20:03:22.3859442Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/tbe/cache/__init__.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/tbe/cache 2025-05-07T20:03:22.3862227Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/tbe/cache/split_embeddings_cache_ops.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/tbe/cache 2025-05-07T20:03:22.3864326Z INFO:root:creating _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/tbe/ssd 2025-05-07T20:03:22.3865491Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/tbe/ssd/__init__.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/tbe/ssd 2025-05-07T20:03:22.3867098Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/tbe/ssd/common.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/tbe/ssd 2025-05-07T20:03:22.3868605Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/tbe/ssd/inference.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/tbe/ssd 2025-05-07T20:03:22.3878500Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/tbe/ssd/training.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/tbe/ssd 2025-05-07T20:03:22.3884629Z INFO:root:creating _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/tbe/stats 2025-05-07T20:03:22.3885805Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/tbe/stats/__init__.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/tbe/stats 2025-05-07T20:03:22.3887427Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/tbe/stats/bench_params_reporter.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/tbe/stats 2025-05-07T20:03:22.3889452Z INFO:root:creating _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/tbe/utils 2025-05-07T20:03:22.3890581Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/tbe/utils/__init__.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/tbe/utils 2025-05-07T20:03:22.3892107Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/tbe/utils/common.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/tbe/utils 2025-05-07T20:03:22.3893609Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/tbe/utils/offsets.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/tbe/utils 2025-05-07T20:03:22.3895044Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/tbe/utils/quantize.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/tbe/utils 2025-05-07T20:03:22.3896609Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/tbe/utils/requests.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/tbe/utils 2025-05-07T20:03:22.3898643Z INFO:root:creating _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/tbe/ssd/utils 2025-05-07T20:03:22.3899825Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/tbe/ssd/utils/__init__.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/tbe/ssd/utils 2025-05-07T20:03:22.3901508Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/tbe/ssd/utils/partially_materialized_tensor.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/tbe/ssd/utils 2025-05-07T20:03:22.3903301Z INFO:root:creating _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/triton/jagged 2025-05-07T20:03:22.3904768Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/triton/jagged/__init__.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/triton/jagged 2025-05-07T20:03:22.3906349Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/triton/jagged/triton_jagged_tensor_ops.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/triton/jagged 2025-05-07T20:03:22.4292140Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/asmjit.so -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu 2025-05-07T20:03:22.4328535Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/fbgemm.so -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu 2025-05-07T20:03:22.4563591Z INFO:root:creating _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/experimental/gen_ai 2025-05-07T20:03:22.4567953Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/experimental/gen_ai/fbgemm_gpu_experimental_gen_ai.so -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/experimental/gen_ai 2025-05-07T20:03:22.8538171Z INFO:root:creating _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/experimental/bench 2025-05-07T20:03:22.8540945Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/experimental/bench/__init__.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/experimental/bench 2025-05-07T20:03:22.8542769Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/experimental/bench/ck_bf16_bench.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/experimental/bench 2025-05-07T20:03:22.8560509Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/experimental/bench/comm_bench.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/experimental/bench 2025-05-07T20:03:22.8562387Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/experimental/bench/gather_scatter_bench.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/experimental/bench 2025-05-07T20:03:22.8589251Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/experimental/bench/quantize_bench.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/experimental/bench 2025-05-07T20:03:22.8596541Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/experimental/bench/quantize_ops.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/experimental/bench 2025-05-07T20:03:22.8615500Z INFO:root:creating _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/experimental/gen_ai/moe 2025-05-07T20:03:22.8619395Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/experimental/gen_ai/moe/README.md -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/experimental/gen_ai/moe 2025-05-07T20:03:22.8624564Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/experimental/gen_ai/moe/__init__.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/experimental/gen_ai/moe 2025-05-07T20:03:22.8643754Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/experimental/gen_ai/moe/activation.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/experimental/gen_ai/moe 2025-05-07T20:03:22.8647995Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/experimental/gen_ai/moe/gather_scatter.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/experimental/gen_ai/moe 2025-05-07T20:03:22.8656587Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/experimental/gen_ai/moe/layers.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/experimental/gen_ai/moe 2025-05-07T20:03:22.8669807Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/experimental/gen_ai/moe/shuffling.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/experimental/gen_ai/moe 2025-05-07T20:03:22.8678088Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/experimental/gen_ai/__init__.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/experimental/gen_ai 2025-05-07T20:03:22.8687375Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/experimental/gen_ai/quantize.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/experimental/gen_ai 2025-05-07T20:03:22.8690605Z INFO:root:creating _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/experimental/example 2025-05-07T20:03:22.8691976Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/experimental/example/fbgemm_gpu_experimental_example_py.so -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/experimental/example 2025-05-07T20:03:22.8737700Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/experimental/example/__init__.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/experimental/example 2025-05-07T20:03:22.8741890Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/experimental/example/utils.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/experimental/example 2025-05-07T20:03:22.8746832Z INFO:root:creating _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/experimental/gemm/triton_gemm 2025-05-07T20:03:22.8748676Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/experimental/gemm/triton_gemm/__init__.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/experimental/gemm/triton_gemm 2025-05-07T20:03:22.8758914Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/experimental/gemm/triton_gemm/fp8_gemm.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/experimental/gemm/triton_gemm 2025-05-07T20:03:22.8780056Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/experimental/gemm/triton_gemm/grouped_gemm.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/experimental/gemm/triton_gemm 2025-05-07T20:03:22.8787329Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/experimental/gemm/triton_gemm/matmul_perf_model.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/experimental/gemm/triton_gemm 2025-05-07T20:03:22.8797641Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/experimental/gemm/triton_gemm/utils.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/experimental/gemm/triton_gemm 2025-05-07T20:03:22.8805491Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/__init__.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu 2025-05-07T20:03:22.8813766Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/batched_unary_embeddings_ops.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu 2025-05-07T20:03:22.8817584Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/enums.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu 2025-05-07T20:03:22.8825236Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/metrics.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu 2025-05-07T20:03:22.8829147Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/permute_pooled_embedding_modules.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu 2025-05-07T20:03:22.8838117Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/permute_pooled_embedding_modules_split.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu 2025-05-07T20:03:22.8842162Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/quantize_comm.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu 2025-05-07T20:03:22.8845676Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/quantize_utils.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu 2025-05-07T20:03:22.8846991Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/runtime_monitor.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu 2025-05-07T20:03:22.8848608Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/sparse_ops.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu 2025-05-07T20:03:22.8850707Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/split_embedding_configs.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu 2025-05-07T20:03:22.8852225Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/split_embedding_inference_converter.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu 2025-05-07T20:03:22.8853848Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/split_embedding_optimizer_ops.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu 2025-05-07T20:03:22.8855393Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/split_embedding_utils.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu 2025-05-07T20:03:22.8856864Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/split_table_batched_embeddings_ops.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu 2025-05-07T20:03:22.8858431Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/split_table_batched_embeddings_ops_common.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu 2025-05-07T20:03:22.8860014Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/split_table_batched_embeddings_ops_inference.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu 2025-05-07T20:03:22.8862675Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/split_table_batched_embeddings_ops_training.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu 2025-05-07T20:03:22.8873730Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/split_table_batched_embeddings_ops_training_common.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu 2025-05-07T20:03:22.8878203Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/ssd_split_table_batched_embeddings_ops.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu 2025-05-07T20:03:22.8881835Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/tbe_input_multiplexer.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu 2025-05-07T20:03:22.8883090Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/uvm.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu 2025-05-07T20:03:22.8884379Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/config/__init__.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/config 2025-05-07T20:03:22.8885768Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/config/feature_list.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/config 2025-05-07T20:03:22.8887110Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/docs/__init__.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/docs 2025-05-07T20:03:22.8888434Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/docs/common.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/docs 2025-05-07T20:03:22.8889795Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/docs/examples.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/docs 2025-05-07T20:03:22.8891198Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/docs/jagged_tensor_ops.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/docs 2025-05-07T20:03:22.8892675Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/docs/merge_pooled_embedding_ops.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/docs 2025-05-07T20:03:22.8894165Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/docs/permute_pooled_embedding_ops.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/docs 2025-05-07T20:03:22.8895611Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/docs/quantize_ops.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/docs 2025-05-07T20:03:22.8896970Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/docs/sparse_ops.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/docs 2025-05-07T20:03:22.8898317Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/docs/version.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/docs 2025-05-07T20:03:22.8899687Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/quantize/__init__.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/quantize 2025-05-07T20:03:22.8901113Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/quantize/quantize_ops.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/quantize 2025-05-07T20:03:22.8902485Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/sll/__init__.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/sll 2025-05-07T20:03:22.8903812Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/tbe/__init__.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/tbe 2025-05-07T20:03:22.8905356Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/triton/__init__.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/triton 2025-05-07T20:03:22.8906935Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/triton/common.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/triton 2025-05-07T20:03:22.8908409Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/triton/quantize.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/triton 2025-05-07T20:03:22.8910523Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/triton/quantize_ref.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/triton 2025-05-07T20:03:22.8912210Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/utils/__init__.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/utils 2025-05-07T20:03:22.8913819Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/utils/filestore.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/utils 2025-05-07T20:03:22.8915339Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/utils/loader.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/utils 2025-05-07T20:03:22.8916896Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/utils/torch_library.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/utils 2025-05-07T20:03:22.8918442Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/sll/cpu/__init__.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/sll/cpu 2025-05-07T20:03:22.8920063Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/sll/cpu/cpu_sll.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/sll/cpu 2025-05-07T20:03:22.8921761Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/sll/meta/__init__.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/sll/meta 2025-05-07T20:03:22.8923352Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/sll/meta/meta_sll.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/sll/meta 2025-05-07T20:03:22.8924900Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/sll/triton/__init__.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/sll/triton 2025-05-07T20:03:22.8926553Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/sll/triton/common.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/sll/triton 2025-05-07T20:03:22.8928566Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/sll/triton/triton_dense_jagged_cat_jagged_out.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/sll/triton 2025-05-07T20:03:22.8930214Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/sll/triton/triton_jagged2_to_padded_dense.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/sll/triton 2025-05-07T20:03:22.8931939Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/sll/triton/triton_jagged_bmm.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/sll/triton 2025-05-07T20:03:22.8933877Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/sll/triton/triton_jagged_bmm_jagged_out.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/sll/triton 2025-05-07T20:03:22.8935579Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/sll/triton/triton_jagged_dense_elementwise_add.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/sll/triton 2025-05-07T20:03:22.8937326Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/sll/triton/triton_jagged_dense_elementwise_mul_jagged_out.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/sll/triton 2025-05-07T20:03:22.8939019Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/sll/triton/triton_jagged_dense_flash_attention.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/sll/triton 2025-05-07T20:03:22.8940854Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/sll/triton/triton_jagged_flash_attention_basic.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/sll/triton 2025-05-07T20:03:22.8942544Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/sll/triton/triton_jagged_self_substraction_jagged_out.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/sll/triton 2025-05-07T20:03:22.8944340Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/sll/triton/triton_jagged_softmax.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/sll/triton 2025-05-07T20:03:22.8945953Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/sll/triton/triton_multi_head_jagged_flash_attention.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/sll/triton 2025-05-07T20:03:22.8947501Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/tbe/bench/__init__.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/tbe/bench 2025-05-07T20:03:22.8949121Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/tbe/bench/bench_config.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/tbe/bench 2025-05-07T20:03:22.8950703Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/tbe/bench/bench_runs.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/tbe/bench 2025-05-07T20:03:22.8952299Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/tbe/bench/eeg_cli.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/tbe/bench 2025-05-07T20:03:22.8953964Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/tbe/bench/embedding_ops_common_config.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/tbe/bench 2025-05-07T20:03:22.8955693Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/tbe/bench/eval_compression.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/tbe/bench 2025-05-07T20:03:22.8958912Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/tbe/bench/reporter.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/tbe/bench 2025-05-07T20:03:22.8960377Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/tbe/bench/tbe_data_config.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/tbe/bench 2025-05-07T20:03:22.8962377Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/tbe/bench/tbe_data_config_loader.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/tbe/bench 2025-05-07T20:03:22.8964018Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/tbe/bench/tbe_data_config_param_models.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/tbe/bench 2025-05-07T20:03:22.8965508Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/tbe/bench/utils.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/tbe/bench 2025-05-07T20:03:22.8967135Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/tbe/cache/__init__.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/tbe/cache 2025-05-07T20:03:22.8968722Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/tbe/cache/split_embeddings_cache_ops.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/tbe/cache 2025-05-07T20:03:22.8970163Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/tbe/ssd/__init__.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/tbe/ssd 2025-05-07T20:03:22.8971700Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/tbe/ssd/common.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/tbe/ssd 2025-05-07T20:03:22.8973240Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/tbe/ssd/inference.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/tbe/ssd 2025-05-07T20:03:22.8974890Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/tbe/ssd/training.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/tbe/ssd 2025-05-07T20:03:22.8977268Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/tbe/stats/__init__.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/tbe/stats 2025-05-07T20:03:22.8978876Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/tbe/stats/bench_params_reporter.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/tbe/stats 2025-05-07T20:03:22.8980387Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/tbe/utils/__init__.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/tbe/utils 2025-05-07T20:03:22.8981947Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/tbe/utils/common.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/tbe/utils 2025-05-07T20:03:22.8983730Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/tbe/utils/offsets.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/tbe/utils 2025-05-07T20:03:22.8985259Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/tbe/utils/quantize.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/tbe/utils 2025-05-07T20:03:22.8986764Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/tbe/utils/requests.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/tbe/utils 2025-05-07T20:03:22.8988850Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/tbe/ssd/utils/__init__.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/tbe/ssd/utils 2025-05-07T20:03:22.8990665Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/tbe/ssd/utils/partially_materialized_tensor.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/tbe/ssd/utils 2025-05-07T20:03:22.8992224Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/triton/jagged/__init__.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/triton/jagged 2025-05-07T20:03:22.8993858Z INFO:root:copying _skbuild/linux-x86_64-3.12/cmake-install/fbgemm_gpu/triton/jagged/triton_jagged_tensor_ops.py -> _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/triton/jagged 2025-05-07T20:03:22.9054871Z INFO:skbuild:copied 90 files 2025-05-07T20:03:22.9055753Z INFO:root:running build_ext 2025-05-07T20:03:22.9479235Z INFO:root:installing to _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel 2025-05-07T20:03:22.9479964Z INFO:root:running install 2025-05-07T20:03:22.9914006Z INFO:root:running install_lib 2025-05-07T20:03:22.9966134Z INFO:root:creating _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel 2025-05-07T20:03:22.9975062Z INFO:root:creating _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/fbgemm_gpu 2025-05-07T20:03:22.9977327Z INFO:root:creating _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/fbgemm_gpu/config 2025-05-07T20:03:22.9980753Z INFO:root:copying _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/config/__init__.py -> _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/config 2025-05-07T20:03:22.9982882Z INFO:root:copying _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/config/feature_list.py -> _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/config 2025-05-07T20:03:22.9984248Z INFO:root:creating _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/fbgemm_gpu/docs 2025-05-07T20:03:22.9985364Z INFO:root:copying _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/docs/__init__.py -> _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/docs 2025-05-07T20:03:22.9986890Z INFO:root:copying _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/docs/common.py -> _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/docs 2025-05-07T20:03:22.9988428Z INFO:root:copying _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/docs/examples.py -> _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/docs 2025-05-07T20:03:22.9989974Z INFO:root:copying _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/docs/jagged_tensor_ops.py -> _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/docs 2025-05-07T20:03:22.9991626Z INFO:root:copying _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/docs/merge_pooled_embedding_ops.py -> _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/docs 2025-05-07T20:03:22.9993313Z INFO:root:copying _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/docs/permute_pooled_embedding_ops.py -> _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/docs 2025-05-07T20:03:22.9995102Z INFO:root:copying _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/docs/quantize_ops.py -> _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/docs 2025-05-07T20:03:22.9996728Z INFO:root:copying _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/docs/sparse_ops.py -> _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/docs 2025-05-07T20:03:22.9998327Z INFO:root:copying _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/docs/version.py -> _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/docs 2025-05-07T20:03:22.9999482Z INFO:root:creating _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/fbgemm_gpu/quantize 2025-05-07T20:03:23.0000698Z INFO:root:copying _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/quantize/__init__.py -> _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/quantize 2025-05-07T20:03:23.0002368Z INFO:root:copying _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/quantize/quantize_ops.py -> _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/quantize 2025-05-07T20:03:23.0003541Z INFO:root:creating _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/fbgemm_gpu/sll 2025-05-07T20:03:23.0004310Z INFO:root:creating _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/fbgemm_gpu/sll/cpu 2025-05-07T20:03:23.0005473Z INFO:root:copying _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/sll/cpu/__init__.py -> _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/sll/cpu 2025-05-07T20:03:23.0007044Z INFO:root:copying _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/sll/cpu/cpu_sll.py -> _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/sll/cpu 2025-05-07T20:03:23.0008223Z INFO:root:creating _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/fbgemm_gpu/sll/meta 2025-05-07T20:03:23.0009596Z INFO:root:copying _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/sll/meta/__init__.py -> _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/sll/meta 2025-05-07T20:03:23.0011198Z INFO:root:copying _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/sll/meta/meta_sll.py -> _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/sll/meta 2025-05-07T20:03:23.0012403Z INFO:root:creating _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/fbgemm_gpu/sll/triton 2025-05-07T20:03:23.0013594Z INFO:root:copying _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/sll/triton/__init__.py -> _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/sll/triton 2025-05-07T20:03:23.0015215Z INFO:root:copying _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/sll/triton/common.py -> _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/sll/triton 2025-05-07T20:03:23.0016955Z INFO:root:copying _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/sll/triton/triton_dense_jagged_cat_jagged_out.py -> _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/sll/triton 2025-05-07T20:03:23.0018769Z INFO:root:copying _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/sll/triton/triton_jagged2_to_padded_dense.py -> _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/sll/triton 2025-05-07T20:03:23.0020525Z INFO:root:copying _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/sll/triton/triton_jagged_bmm.py -> _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/sll/triton 2025-05-07T20:03:23.0022653Z INFO:root:copying _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/sll/triton/triton_jagged_bmm_jagged_out.py -> _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/sll/triton 2025-05-07T20:03:23.0024617Z INFO:root:copying _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/sll/triton/triton_jagged_dense_elementwise_add.py -> _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/sll/triton 2025-05-07T20:03:23.0026513Z INFO:root:copying _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/sll/triton/triton_jagged_dense_elementwise_mul_jagged_out.py -> _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/sll/triton 2025-05-07T20:03:23.0028472Z INFO:root:copying _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/sll/triton/triton_jagged_dense_flash_attention.py -> _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/sll/triton 2025-05-07T20:03:23.0030348Z INFO:root:copying _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/sll/triton/triton_jagged_flash_attention_basic.py -> _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/sll/triton 2025-05-07T20:03:23.0032261Z INFO:root:copying _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/sll/triton/triton_jagged_self_substraction_jagged_out.py -> _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/sll/triton 2025-05-07T20:03:23.0034096Z INFO:root:copying _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/sll/triton/triton_jagged_softmax.py -> _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/sll/triton 2025-05-07T20:03:23.0035925Z INFO:root:copying _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/sll/triton/triton_multi_head_jagged_flash_attention.py -> _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/sll/triton 2025-05-07T20:03:23.0037590Z INFO:root:copying _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/sll/__init__.py -> _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/sll 2025-05-07T20:03:23.0038726Z INFO:root:creating _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/fbgemm_gpu/tbe 2025-05-07T20:03:23.0039516Z INFO:root:creating _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/fbgemm_gpu/tbe/bench 2025-05-07T20:03:23.0040701Z INFO:root:copying _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/tbe/bench/__init__.py -> _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe/bench 2025-05-07T20:03:23.0042341Z INFO:root:copying _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/tbe/bench/bench_config.py -> _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe/bench 2025-05-07T20:03:23.0043993Z INFO:root:copying _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/tbe/bench/bench_runs.py -> _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe/bench 2025-05-07T20:03:23.0045589Z INFO:root:copying _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/tbe/bench/eeg_cli.py -> _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe/bench 2025-05-07T20:03:23.0047279Z INFO:root:copying _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/tbe/bench/embedding_ops_common_config.py -> _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe/bench 2025-05-07T20:03:23.0049018Z INFO:root:copying _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/tbe/bench/eval_compression.py -> _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe/bench 2025-05-07T20:03:23.0050730Z INFO:root:copying _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/tbe/bench/reporter.py -> _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe/bench 2025-05-07T20:03:23.0052434Z INFO:root:copying _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/tbe/bench/tbe_data_config.py -> _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe/bench 2025-05-07T20:03:23.0054138Z INFO:root:copying _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/tbe/bench/tbe_data_config_loader.py -> _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe/bench 2025-05-07T20:03:23.0055928Z INFO:root:copying _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/tbe/bench/tbe_data_config_param_models.py -> _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe/bench 2025-05-07T20:03:23.0057590Z INFO:root:copying _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/tbe/bench/utils.py -> _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe/bench 2025-05-07T20:03:23.0058802Z INFO:root:creating _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/fbgemm_gpu/tbe/cache 2025-05-07T20:03:23.0060057Z INFO:root:copying _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/tbe/cache/__init__.py -> _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe/cache 2025-05-07T20:03:23.0061709Z INFO:root:copying _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/tbe/cache/split_embeddings_cache_ops.py -> _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe/cache 2025-05-07T20:03:23.0062986Z INFO:root:creating _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/fbgemm_gpu/tbe/ssd 2025-05-07T20:03:23.0063847Z INFO:root:creating _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/fbgemm_gpu/tbe/ssd/utils 2025-05-07T20:03:23.0065120Z INFO:root:copying _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/tbe/ssd/utils/__init__.py -> _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe/ssd/utils 2025-05-07T20:03:23.0066909Z INFO:root:copying _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/tbe/ssd/utils/partially_materialized_tensor.py -> _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe/ssd/utils 2025-05-07T20:03:23.0068645Z INFO:root:copying _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/tbe/ssd/__init__.py -> _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe/ssd 2025-05-07T20:03:23.0070193Z INFO:root:copying _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/tbe/ssd/common.py -> _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe/ssd 2025-05-07T20:03:23.0071793Z INFO:root:copying _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/tbe/ssd/inference.py -> _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe/ssd 2025-05-07T20:03:23.0073395Z INFO:root:copying _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/tbe/ssd/training.py -> _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe/ssd 2025-05-07T20:03:23.0074585Z INFO:root:creating _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/fbgemm_gpu/tbe/stats 2025-05-07T20:03:23.0075795Z INFO:root:copying _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/tbe/stats/__init__.py -> _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe/stats 2025-05-07T20:03:23.0077468Z INFO:root:copying _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/tbe/stats/bench_params_reporter.py -> _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe/stats 2025-05-07T20:03:23.0078776Z INFO:root:creating _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/fbgemm_gpu/tbe/utils 2025-05-07T20:03:23.0080011Z INFO:root:copying _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/tbe/utils/__init__.py -> _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe/utils 2025-05-07T20:03:23.0081624Z INFO:root:copying _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/tbe/utils/common.py -> _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe/utils 2025-05-07T20:03:23.0083253Z INFO:root:copying _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/tbe/utils/offsets.py -> _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe/utils 2025-05-07T20:03:23.0084883Z INFO:root:copying _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/tbe/utils/quantize.py -> _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe/utils 2025-05-07T20:03:23.0086516Z INFO:root:copying _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/tbe/utils/requests.py -> _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe/utils 2025-05-07T20:03:23.0088078Z INFO:root:copying _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/tbe/__init__.py -> _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/tbe 2025-05-07T20:03:23.0089223Z INFO:root:creating _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/fbgemm_gpu/triton 2025-05-07T20:03:23.0090041Z INFO:root:creating _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/fbgemm_gpu/triton/jagged 2025-05-07T20:03:23.0091272Z INFO:root:copying _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/triton/jagged/__init__.py -> _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/triton/jagged 2025-05-07T20:03:23.0093023Z INFO:root:copying _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/triton/jagged/triton_jagged_tensor_ops.py -> _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/triton/jagged 2025-05-07T20:03:23.0094716Z INFO:root:copying _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/triton/__init__.py -> _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/triton 2025-05-07T20:03:23.0096242Z INFO:root:copying _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/triton/common.py -> _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/triton 2025-05-07T20:03:23.0097816Z INFO:root:copying _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/triton/quantize.py -> _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/triton 2025-05-07T20:03:23.0099410Z INFO:root:copying _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/triton/quantize_ref.py -> _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/triton 2025-05-07T20:03:23.0100586Z INFO:root:creating _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/fbgemm_gpu/utils 2025-05-07T20:03:23.0101742Z INFO:root:copying _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/utils/__init__.py -> _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/utils 2025-05-07T20:03:23.0103354Z INFO:root:copying _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/utils/filestore.py -> _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/utils 2025-05-07T20:03:23.0104891Z INFO:root:copying _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/utils/loader.py -> _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/utils 2025-05-07T20:03:23.0106520Z INFO:root:copying _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/utils/torch_library.py -> _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/utils 2025-05-07T20:03:23.0108069Z INFO:root:copying _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/asmjit.so -> _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu 2025-05-07T20:03:23.0109618Z INFO:root:copying _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/fbgemm.so -> _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu 2025-05-07T20:03:23.0140954Z INFO:root:creating _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/fbgemm_gpu/experimental 2025-05-07T20:03:23.0143419Z INFO:root:creating _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/fbgemm_gpu/experimental/gen_ai 2025-05-07T20:03:23.0146623Z INFO:root:copying _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/experimental/gen_ai/fbgemm_gpu_experimental_gen_ai.so -> _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/experimental/gen_ai 2025-05-07T20:03:23.0747303Z INFO:root:creating _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/fbgemm_gpu/experimental/gen_ai/moe 2025-05-07T20:03:23.0753504Z INFO:root:copying _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/experimental/gen_ai/moe/README.md -> _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/experimental/gen_ai/moe 2025-05-07T20:03:23.0755416Z INFO:root:copying _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/experimental/gen_ai/moe/__init__.py -> _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/experimental/gen_ai/moe 2025-05-07T20:03:23.0757598Z INFO:root:copying _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/experimental/gen_ai/moe/activation.py -> _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/experimental/gen_ai/moe 2025-05-07T20:03:23.0759772Z INFO:root:copying _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/experimental/gen_ai/moe/gather_scatter.py -> _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/experimental/gen_ai/moe 2025-05-07T20:03:23.0761629Z INFO:root:copying _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/experimental/gen_ai/moe/layers.py -> _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/experimental/gen_ai/moe 2025-05-07T20:03:23.0763485Z INFO:root:copying _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/experimental/gen_ai/moe/shuffling.py -> _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/experimental/gen_ai/moe 2025-05-07T20:03:23.0765307Z INFO:root:copying _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/experimental/gen_ai/__init__.py -> _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/experimental/gen_ai 2025-05-07T20:03:23.0767064Z INFO:root:copying _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/experimental/gen_ai/quantize.py -> _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/experimental/gen_ai 2025-05-07T20:03:23.0768426Z INFO:root:creating _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/fbgemm_gpu/experimental/bench 2025-05-07T20:03:23.0769764Z INFO:root:copying _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/experimental/bench/__init__.py -> _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/experimental/bench 2025-05-07T20:03:23.0771542Z INFO:root:copying _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/experimental/bench/ck_bf16_bench.py -> _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/experimental/bench 2025-05-07T20:03:23.0774056Z INFO:root:copying _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/experimental/bench/comm_bench.py -> _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/experimental/bench 2025-05-07T20:03:23.0775991Z INFO:root:copying _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/experimental/bench/gather_scatter_bench.py -> _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/experimental/bench 2025-05-07T20:03:23.0777885Z INFO:root:copying _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/experimental/bench/quantize_bench.py -> _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/experimental/bench 2025-05-07T20:03:23.0779758Z INFO:root:copying _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/experimental/bench/quantize_ops.py -> _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/experimental/bench 2025-05-07T20:03:23.0781123Z INFO:root:creating _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/fbgemm_gpu/experimental/example 2025-05-07T20:03:23.0782637Z INFO:root:copying _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/experimental/example/fbgemm_gpu_experimental_example_py.so -> _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/experimental/example 2025-05-07T20:03:23.0784683Z INFO:root:copying _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/experimental/example/__init__.py -> _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/experimental/example 2025-05-07T20:03:23.0786501Z INFO:root:copying _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/experimental/example/utils.py -> _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/experimental/example 2025-05-07T20:03:23.0787863Z INFO:root:creating _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/fbgemm_gpu/experimental/gemm 2025-05-07T20:03:23.0788765Z INFO:root:creating _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/fbgemm_gpu/experimental/gemm/triton_gemm 2025-05-07T20:03:23.0790238Z INFO:root:copying _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/experimental/gemm/triton_gemm/__init__.py -> _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/experimental/gemm/triton_gemm 2025-05-07T20:03:23.0792184Z INFO:root:copying _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/experimental/gemm/triton_gemm/fp8_gemm.py -> _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/experimental/gemm/triton_gemm 2025-05-07T20:03:23.0794130Z INFO:root:copying _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/experimental/gemm/triton_gemm/grouped_gemm.py -> _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/experimental/gemm/triton_gemm 2025-05-07T20:03:23.0796152Z INFO:root:copying _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/experimental/gemm/triton_gemm/matmul_perf_model.py -> _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/experimental/gemm/triton_gemm 2025-05-07T20:03:23.0798148Z INFO:root:copying _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/experimental/gemm/triton_gemm/utils.py -> _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu/experimental/gemm/triton_gemm 2025-05-07T20:03:23.0799829Z INFO:root:copying _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/__init__.py -> _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu 2025-05-07T20:03:23.0801327Z INFO:root:copying _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/batched_unary_embeddings_ops.py -> _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu 2025-05-07T20:03:23.0802883Z INFO:root:copying _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/enums.py -> _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu 2025-05-07T20:03:23.0804353Z INFO:root:copying _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/metrics.py -> _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu 2025-05-07T20:03:23.0805866Z INFO:root:copying _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/permute_pooled_embedding_modules.py -> _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu 2025-05-07T20:03:23.0807560Z INFO:root:copying _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/permute_pooled_embedding_modules_split.py -> _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu 2025-05-07T20:03:23.0809432Z INFO:root:copying _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/quantize_comm.py -> _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu 2025-05-07T20:03:23.0810893Z INFO:root:copying _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/quantize_utils.py -> _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu 2025-05-07T20:03:23.0812450Z INFO:root:copying _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/runtime_monitor.py -> _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu 2025-05-07T20:03:23.0813943Z INFO:root:copying _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/sparse_ops.py -> _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu 2025-05-07T20:03:23.0828419Z INFO:root:copying _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/split_embedding_configs.py -> _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu 2025-05-07T20:03:23.0830320Z INFO:root:copying _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/split_embedding_inference_converter.py -> _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu 2025-05-07T20:03:23.0832001Z INFO:root:copying _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/split_embedding_optimizer_ops.py -> _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu 2025-05-07T20:03:23.0833560Z INFO:root:copying _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/split_embedding_utils.py -> _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu 2025-05-07T20:03:23.0835180Z INFO:root:copying _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/split_table_batched_embeddings_ops.py -> _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu 2025-05-07T20:03:23.0836862Z INFO:root:copying _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/split_table_batched_embeddings_ops_common.py -> _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu 2025-05-07T20:03:23.0838562Z INFO:root:copying _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/split_table_batched_embeddings_ops_inference.py -> _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu 2025-05-07T20:03:23.0840298Z INFO:root:copying _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/split_table_batched_embeddings_ops_training.py -> _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu 2025-05-07T20:03:23.0842058Z INFO:root:copying _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/split_table_batched_embeddings_ops_training_common.py -> _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu 2025-05-07T20:03:23.0844018Z INFO:root:copying _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/ssd_split_table_batched_embeddings_ops.py -> _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu 2025-05-07T20:03:23.0845624Z INFO:root:copying _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/tbe_input_multiplexer.py -> _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu 2025-05-07T20:03:23.0847159Z INFO:root:copying _skbuild/linux-x86_64-3.12/setuptools/lib.linux-x86_64-cpython-312/fbgemm_gpu/uvm.py -> _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu 2025-05-07T20:03:23.0848108Z INFO:skbuild:copied 115 files 2025-05-07T20:03:23.0848420Z INFO:root:running install_egg_info 2025-05-07T20:03:23.1289238Z INFO:root:running egg_info 2025-05-07T20:03:23.1334585Z INFO:root:creating fbgemm_gpu_genai_nightly.egg-info 2025-05-07T20:03:23.1340917Z INFO:root:writing fbgemm_gpu_genai_nightly.egg-info/PKG-INFO 2025-05-07T20:03:23.1587407Z INFO:root:writing dependency_links to fbgemm_gpu_genai_nightly.egg-info/dependency_links.txt 2025-05-07T20:03:23.1666186Z INFO:root:writing requirements to fbgemm_gpu_genai_nightly.egg-info/requires.txt 2025-05-07T20:03:23.1670620Z INFO:root:writing top-level names to fbgemm_gpu_genai_nightly.egg-info/top_level.txt 2025-05-07T20:03:23.1725745Z INFO:root:writing manifest file 'fbgemm_gpu_genai_nightly.egg-info/SOURCES.txt' 2025-05-07T20:03:23.1996219Z INFO:root:reading manifest file 'fbgemm_gpu_genai_nightly.egg-info/SOURCES.txt' 2025-05-07T20:03:23.2033339Z INFO:root:writing manifest file 'fbgemm_gpu_genai_nightly.egg-info/SOURCES.txt' 2025-05-07T20:03:23.2041559Z INFO:root:Copying fbgemm_gpu_genai_nightly.egg-info to _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/./fbgemm_gpu_genai_nightly-2025.5.7-py3.12.egg-info 2025-05-07T20:03:23.2083086Z INFO:root:running install_scripts 2025-05-07T20:03:23.2084046Z INFO:skbuild:copied 0 files 2025-05-07T20:03:30.8526594Z INFO:root:creating _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel/fbgemm_gpu_genai_nightly-2025.5.7.dist-info/WHEEL 2025-05-07T20:03:30.8928665Z INFO:wheel:creating '/__w/FBGEMM/FBGEMM/fbgemm_gpu/dist/.tmp-9u1c9m79/fbgemm_gpu_genai_nightly-2025.5.7-cp312-cp312-manylinux_2_28_x86_64.whl' and adding '_skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel' to it 2025-05-07T20:03:30.9075940Z INFO:wheel:adding 'fbgemm_gpu/__init__.py' 2025-05-07T20:03:30.9664580Z INFO:wheel:adding 'fbgemm_gpu/asmjit.so' 2025-05-07T20:03:30.9675558Z INFO:wheel:adding 'fbgemm_gpu/batched_unary_embeddings_ops.py' 2025-05-07T20:03:30.9678192Z INFO:wheel:adding 'fbgemm_gpu/enums.py' 2025-05-07T20:03:31.1289242Z INFO:wheel:adding 'fbgemm_gpu/fbgemm.so' 2025-05-07T20:03:31.1406752Z INFO:wheel:adding 'fbgemm_gpu/metrics.py' 2025-05-07T20:03:31.1408131Z INFO:wheel:adding 'fbgemm_gpu/permute_pooled_embedding_modules.py' 2025-05-07T20:03:31.1410135Z INFO:wheel:adding 'fbgemm_gpu/permute_pooled_embedding_modules_split.py' 2025-05-07T20:03:31.1411533Z INFO:wheel:adding 'fbgemm_gpu/quantize_comm.py' 2025-05-07T20:03:31.1414005Z INFO:wheel:adding 'fbgemm_gpu/quantize_utils.py' 2025-05-07T20:03:31.1417106Z INFO:wheel:adding 'fbgemm_gpu/runtime_monitor.py' 2025-05-07T20:03:31.1428275Z INFO:wheel:adding 'fbgemm_gpu/sparse_ops.py' 2025-05-07T20:03:31.1431560Z INFO:wheel:adding 'fbgemm_gpu/split_embedding_configs.py' 2025-05-07T20:03:31.1433970Z INFO:wheel:adding 'fbgemm_gpu/split_embedding_inference_converter.py' 2025-05-07T20:03:31.1435471Z INFO:wheel:adding 'fbgemm_gpu/split_embedding_optimizer_ops.py' 2025-05-07T20:03:31.1436852Z INFO:wheel:adding 'fbgemm_gpu/split_embedding_utils.py' 2025-05-07T20:03:31.1438700Z INFO:wheel:adding 'fbgemm_gpu/split_table_batched_embeddings_ops.py' 2025-05-07T20:03:31.1441785Z INFO:wheel:adding 'fbgemm_gpu/split_table_batched_embeddings_ops_common.py' 2025-05-07T20:03:31.1465922Z INFO:wheel:adding 'fbgemm_gpu/split_table_batched_embeddings_ops_inference.py' 2025-05-07T20:03:31.1504815Z INFO:wheel:adding 'fbgemm_gpu/split_table_batched_embeddings_ops_training.py' 2025-05-07T20:03:31.1508471Z INFO:wheel:adding 'fbgemm_gpu/split_table_batched_embeddings_ops_training_common.py' 2025-05-07T20:03:31.1510531Z INFO:wheel:adding 'fbgemm_gpu/ssd_split_table_batched_embeddings_ops.py' 2025-05-07T20:03:31.1512176Z INFO:wheel:adding 'fbgemm_gpu/tbe_input_multiplexer.py' 2025-05-07T20:03:31.1513639Z INFO:wheel:adding 'fbgemm_gpu/uvm.py' 2025-05-07T20:03:31.1515526Z INFO:wheel:adding 'fbgemm_gpu/config/__init__.py' 2025-05-07T20:03:31.1517264Z INFO:wheel:adding 'fbgemm_gpu/config/feature_list.py' 2025-05-07T20:03:31.1518981Z INFO:wheel:adding 'fbgemm_gpu/docs/__init__.py' 2025-05-07T20:03:31.1520253Z INFO:wheel:adding 'fbgemm_gpu/docs/common.py' 2025-05-07T20:03:31.1522113Z INFO:wheel:adding 'fbgemm_gpu/docs/examples.py' 2025-05-07T20:03:31.1524619Z INFO:wheel:adding 'fbgemm_gpu/docs/jagged_tensor_ops.py' 2025-05-07T20:03:31.1526208Z INFO:wheel:adding 'fbgemm_gpu/docs/merge_pooled_embedding_ops.py' 2025-05-07T20:03:31.1528495Z INFO:wheel:adding 'fbgemm_gpu/docs/permute_pooled_embedding_ops.py' 2025-05-07T20:03:31.1529941Z INFO:wheel:adding 'fbgemm_gpu/docs/quantize_ops.py' 2025-05-07T20:03:31.1536012Z INFO:wheel:adding 'fbgemm_gpu/docs/sparse_ops.py' 2025-05-07T20:03:31.1537719Z INFO:wheel:adding 'fbgemm_gpu/docs/version.py' 2025-05-07T20:03:31.1539457Z INFO:wheel:adding 'fbgemm_gpu/experimental/bench/__init__.py' 2025-05-07T20:03:31.1541868Z INFO:wheel:adding 'fbgemm_gpu/experimental/bench/ck_bf16_bench.py' 2025-05-07T20:03:31.1544989Z INFO:wheel:adding 'fbgemm_gpu/experimental/bench/comm_bench.py' 2025-05-07T20:03:31.1548737Z INFO:wheel:adding 'fbgemm_gpu/experimental/bench/gather_scatter_bench.py' 2025-05-07T20:03:31.1554879Z INFO:wheel:adding 'fbgemm_gpu/experimental/bench/quantize_bench.py' 2025-05-07T20:03:31.1571597Z INFO:wheel:adding 'fbgemm_gpu/experimental/bench/quantize_ops.py' 2025-05-07T20:03:31.1573165Z INFO:wheel:adding 'fbgemm_gpu/experimental/example/__init__.py' 2025-05-07T20:03:31.1721143Z INFO:wheel:adding 'fbgemm_gpu/experimental/example/fbgemm_gpu_experimental_example_py.so' 2025-05-07T20:03:31.1734820Z INFO:wheel:adding 'fbgemm_gpu/experimental/example/utils.py' 2025-05-07T20:03:31.1735450Z INFO:wheel:adding 'fbgemm_gpu/experimental/gemm/triton_gemm/__init__.py' 2025-05-07T20:03:31.1763164Z INFO:wheel:adding 'fbgemm_gpu/experimental/gemm/triton_gemm/fp8_gemm.py' 2025-05-07T20:03:31.1769489Z INFO:wheel:adding 'fbgemm_gpu/experimental/gemm/triton_gemm/grouped_gemm.py' 2025-05-07T20:03:31.1773219Z INFO:wheel:adding 'fbgemm_gpu/experimental/gemm/triton_gemm/matmul_perf_model.py' 2025-05-07T20:03:31.1775118Z INFO:wheel:adding 'fbgemm_gpu/experimental/gemm/triton_gemm/utils.py' 2025-05-07T20:03:31.1776842Z INFO:wheel:adding 'fbgemm_gpu/experimental/gen_ai/__init__.py' 2025-05-07T20:03:33.1645472Z INFO:wheel:adding 'fbgemm_gpu/experimental/gen_ai/fbgemm_gpu_experimental_gen_ai.so' 2025-05-07T20:03:33.3650263Z INFO:wheel:adding 'fbgemm_gpu/experimental/gen_ai/quantize.py' 2025-05-07T20:03:33.3651373Z INFO:wheel:adding 'fbgemm_gpu/experimental/gen_ai/moe/README.md' 2025-05-07T20:03:33.3651946Z INFO:wheel:adding 'fbgemm_gpu/experimental/gen_ai/moe/__init__.py' 2025-05-07T20:03:33.3652483Z INFO:wheel:adding 'fbgemm_gpu/experimental/gen_ai/moe/activation.py' 2025-05-07T20:03:33.3656527Z INFO:wheel:adding 'fbgemm_gpu/experimental/gen_ai/moe/gather_scatter.py' 2025-05-07T20:03:33.3666783Z INFO:wheel:adding 'fbgemm_gpu/experimental/gen_ai/moe/layers.py' 2025-05-07T20:03:33.3670949Z INFO:wheel:adding 'fbgemm_gpu/experimental/gen_ai/moe/shuffling.py' 2025-05-07T20:03:33.3672317Z INFO:wheel:adding 'fbgemm_gpu/quantize/__init__.py' 2025-05-07T20:03:33.3674175Z INFO:wheel:adding 'fbgemm_gpu/quantize/quantize_ops.py' 2025-05-07T20:03:33.3676247Z INFO:wheel:adding 'fbgemm_gpu/sll/__init__.py' 2025-05-07T20:03:33.3678116Z INFO:wheel:adding 'fbgemm_gpu/sll/cpu/__init__.py' 2025-05-07T20:03:33.3686425Z INFO:wheel:adding 'fbgemm_gpu/sll/cpu/cpu_sll.py' 2025-05-07T20:03:33.3687649Z INFO:wheel:adding 'fbgemm_gpu/sll/meta/__init__.py' 2025-05-07T20:03:33.3689226Z INFO:wheel:adding 'fbgemm_gpu/sll/meta/meta_sll.py' 2025-05-07T20:03:33.3691879Z INFO:wheel:adding 'fbgemm_gpu/sll/triton/__init__.py' 2025-05-07T20:03:33.3693644Z INFO:wheel:adding 'fbgemm_gpu/sll/triton/common.py' 2025-05-07T20:03:33.3695200Z INFO:wheel:adding 'fbgemm_gpu/sll/triton/triton_dense_jagged_cat_jagged_out.py' 2025-05-07T20:03:33.3698336Z INFO:wheel:adding 'fbgemm_gpu/sll/triton/triton_jagged2_to_padded_dense.py' 2025-05-07T20:03:33.3701311Z INFO:wheel:adding 'fbgemm_gpu/sll/triton/triton_jagged_bmm.py' 2025-05-07T20:03:33.3705121Z INFO:wheel:adding 'fbgemm_gpu/sll/triton/triton_jagged_bmm_jagged_out.py' 2025-05-07T20:03:33.3707160Z INFO:wheel:adding 'fbgemm_gpu/sll/triton/triton_jagged_dense_elementwise_add.py' 2025-05-07T20:03:33.3709598Z INFO:wheel:adding 'fbgemm_gpu/sll/triton/triton_jagged_dense_elementwise_mul_jagged_out.py' 2025-05-07T20:03:33.3715783Z INFO:wheel:adding 'fbgemm_gpu/sll/triton/triton_jagged_dense_flash_attention.py' 2025-05-07T20:03:33.3721065Z INFO:wheel:adding 'fbgemm_gpu/sll/triton/triton_jagged_flash_attention_basic.py' 2025-05-07T20:03:33.3723099Z INFO:wheel:adding 'fbgemm_gpu/sll/triton/triton_jagged_self_substraction_jagged_out.py' 2025-05-07T20:03:33.3726884Z INFO:wheel:adding 'fbgemm_gpu/sll/triton/triton_jagged_softmax.py' 2025-05-07T20:03:33.3732191Z INFO:wheel:adding 'fbgemm_gpu/sll/triton/triton_multi_head_jagged_flash_attention.py' 2025-05-07T20:03:33.3734686Z INFO:wheel:adding 'fbgemm_gpu/tbe/__init__.py' 2025-05-07T20:03:33.3736522Z INFO:wheel:adding 'fbgemm_gpu/tbe/bench/__init__.py' 2025-05-07T20:03:33.3738682Z INFO:wheel:adding 'fbgemm_gpu/tbe/bench/bench_config.py' 2025-05-07T20:03:33.3743321Z INFO:wheel:adding 'fbgemm_gpu/tbe/bench/bench_runs.py' 2025-05-07T20:03:33.3746257Z INFO:wheel:adding 'fbgemm_gpu/tbe/bench/eeg_cli.py' 2025-05-07T20:03:33.3748500Z INFO:wheel:adding 'fbgemm_gpu/tbe/bench/embedding_ops_common_config.py' 2025-05-07T20:03:33.3750259Z INFO:wheel:adding 'fbgemm_gpu/tbe/bench/eval_compression.py' 2025-05-07T20:03:33.3751372Z INFO:wheel:adding 'fbgemm_gpu/tbe/bench/reporter.py' 2025-05-07T20:03:33.3754582Z INFO:wheel:adding 'fbgemm_gpu/tbe/bench/tbe_data_config.py' 2025-05-07T20:03:33.3757701Z INFO:wheel:adding 'fbgemm_gpu/tbe/bench/tbe_data_config_loader.py' 2025-05-07T20:03:33.3759975Z INFO:wheel:adding 'fbgemm_gpu/tbe/bench/tbe_data_config_param_models.py' 2025-05-07T20:03:33.3761533Z INFO:wheel:adding 'fbgemm_gpu/tbe/bench/utils.py' 2025-05-07T20:03:33.3762814Z INFO:wheel:adding 'fbgemm_gpu/tbe/cache/__init__.py' 2025-05-07T20:03:33.3764386Z INFO:wheel:adding 'fbgemm_gpu/tbe/cache/split_embeddings_cache_ops.py' 2025-05-07T20:03:33.3765821Z INFO:wheel:adding 'fbgemm_gpu/tbe/ssd/__init__.py' 2025-05-07T20:03:33.3767129Z INFO:wheel:adding 'fbgemm_gpu/tbe/ssd/common.py' 2025-05-07T20:03:33.3773276Z INFO:wheel:adding 'fbgemm_gpu/tbe/ssd/inference.py' 2025-05-07T20:03:33.3801923Z INFO:wheel:adding 'fbgemm_gpu/tbe/ssd/training.py' 2025-05-07T20:03:33.3803208Z INFO:wheel:adding 'fbgemm_gpu/tbe/ssd/utils/__init__.py' 2025-05-07T20:03:33.3804786Z INFO:wheel:adding 'fbgemm_gpu/tbe/ssd/utils/partially_materialized_tensor.py' 2025-05-07T20:03:33.3806282Z INFO:wheel:adding 'fbgemm_gpu/tbe/stats/__init__.py' 2025-05-07T20:03:33.3809319Z INFO:wheel:adding 'fbgemm_gpu/tbe/stats/bench_params_reporter.py' 2025-05-07T20:03:33.3810566Z INFO:wheel:adding 'fbgemm_gpu/tbe/utils/__init__.py' 2025-05-07T20:03:33.3812119Z INFO:wheel:adding 'fbgemm_gpu/tbe/utils/common.py' 2025-05-07T20:03:33.3813695Z INFO:wheel:adding 'fbgemm_gpu/tbe/utils/offsets.py' 2025-05-07T20:03:33.3816131Z INFO:wheel:adding 'fbgemm_gpu/tbe/utils/quantize.py' 2025-05-07T20:03:33.3821569Z INFO:wheel:adding 'fbgemm_gpu/tbe/utils/requests.py' 2025-05-07T20:03:33.3823732Z INFO:wheel:adding 'fbgemm_gpu/triton/__init__.py' 2025-05-07T20:03:33.3825472Z INFO:wheel:adding 'fbgemm_gpu/triton/common.py' 2025-05-07T20:03:33.3833377Z INFO:wheel:adding 'fbgemm_gpu/triton/quantize.py' 2025-05-07T20:03:33.3837859Z INFO:wheel:adding 'fbgemm_gpu/triton/quantize_ref.py' 2025-05-07T20:03:33.3839954Z INFO:wheel:adding 'fbgemm_gpu/triton/jagged/__init__.py' 2025-05-07T20:03:33.3847791Z INFO:wheel:adding 'fbgemm_gpu/triton/jagged/triton_jagged_tensor_ops.py' 2025-05-07T20:03:33.3849758Z INFO:wheel:adding 'fbgemm_gpu/utils/__init__.py' 2025-05-07T20:03:33.3851449Z INFO:wheel:adding 'fbgemm_gpu/utils/filestore.py' 2025-05-07T20:03:33.3852970Z INFO:wheel:adding 'fbgemm_gpu/utils/loader.py' 2025-05-07T20:03:33.3855090Z INFO:wheel:adding 'fbgemm_gpu/utils/torch_library.py' 2025-05-07T20:03:33.3857592Z INFO:wheel:adding 'fbgemm_gpu_genai_nightly-2025.5.7.dist-info/METADATA' 2025-05-07T20:03:33.3858496Z INFO:wheel:adding 'fbgemm_gpu_genai_nightly-2025.5.7.dist-info/WHEEL' 2025-05-07T20:03:33.3859488Z INFO:wheel:adding 'fbgemm_gpu_genai_nightly-2025.5.7.dist-info/top_level.txt' 2025-05-07T20:03:33.3891912Z INFO:wheel:adding 'fbgemm_gpu_genai_nightly-2025.5.7.dist-info/RECORD' 2025-05-07T20:03:33.3893705Z INFO:root:removing _skbuild/linux-x86_64-3.12/setuptools/bdist.linux-x86_64/wheel 2025-05-07T20:03:33.5180263Z ╒════════════════════════════╤════════════════════════════════════════════════╕ 2025-05-07T20:03:42.9491374Z │ │ Version │ 2025-05-07T20:03:42.9492374Z ╞════════════════════════════╪════════════════════════════════════════════════╡ 2025-05-07T20:03:42.9492944Z │ PyTorch │ 2.8.0.dev20250507+cu128 │ 2025-05-07T20:03:42.9493792Z ├────────────────────────────┼────────────────────────────────────────────────┤ 2025-05-07T20:03:42.9494380Z │ CUDA (Declared by PyTorch) │ 12.8 │ 2025-05-07T20:03:42.9495030Z ├────────────────────────────┼────────────────────────────────────────────────┤ 2025-05-07T20:03:42.9495628Z │ CUDA (Actual) │ nvcc: NVIDIA (R) Cuda compiler driver │ 2025-05-07T20:03:42.9496293Z │ │ Copyright (c) 2005-2025 NVIDIA Corporation │ 2025-05-07T20:03:42.9496826Z │ │ Built on Wed_Jan_15_19:20:09_PST_2025 │ 2025-05-07T20:03:42.9497313Z │ │ Cuda compilation tools, release 12.8, V12.8.61 │ 2025-05-07T20:03:42.9497852Z │ │ Build cuda_12.8.r12.8/compiler.35404655_0 │ 2025-05-07T20:03:42.9498414Z ╘════════════════════════════╧════════════════════════════════════════════════╛ 2025-05-07T20:03:42.9499031Z Successfully built fbgemm_gpu_genai_nightly-2025.5.7-cp312-cp312-manylinux_2_28_x86_64.whl 2025-05-07T20:03:47.3397347Z 2025-05-07T20:03:47.4632464Z ################################################################################ 2025-05-07T20:03:47.4634478Z [CHECK] BUILT LIBRARY: ./_skbuild/linux-x86_64-3.12/cmake-build/experimental/gen_ai/fbgemm_gpu_experimental_gen_ai.so 2025-05-07T20:03:47.4636376Z [CHECK] Listing out library size: 2025-05-07T20:03:47.4662020Z + du -h --block-size=1M ./_skbuild/linux-x86_64-3.12/cmake-build/experimental/gen_ai/fbgemm_gpu_experimental_gen_ai.so 2025-05-07T20:03:47.4663801Z 2025-05-07T20:03:47.4748554Z 91 ./_skbuild/linux-x86_64-3.12/cmake-build/experimental/gen_ai/fbgemm_gpu_experimental_gen_ai.so 2025-05-07T20:03:47.4749064Z 2025-05-07T20:03:47.4796363Z [CHECK] Listing out the GLIBC versions referenced by: ./_skbuild/linux-x86_64-3.12/cmake-build/experimental/gen_ai/fbgemm_gpu_experimental_gen_ai.so 2025-05-07T20:03:47.4800558Z + objdump -TC ./_skbuild/linux-x86_64-3.12/cmake-build/experimental/gen_ai/fbgemm_gpu_experimental_gen_ai.so | grep GLIBC_ | sed 's/.*GLIBC_\([.0-9]*\).*/GLIBC_\1/g' | sort -Vu | cat 2025-05-07T20:03:47.4802496Z 2025-05-07T20:03:47.5814643Z GLIBC_2.2.5 2025-05-07T20:03:47.5816124Z GLIBC_2.3 2025-05-07T20:03:47.5816792Z GLIBC_2.14 2025-05-07T20:03:47.5817198Z 2025-05-07T20:03:47.5817212Z 2025-05-07T20:03:47.5818791Z [CHECK] Listing out the GLIBCXX versions referenced by: ./_skbuild/linux-x86_64-3.12/cmake-build/experimental/gen_ai/fbgemm_gpu_experimental_gen_ai.so 2025-05-07T20:03:47.5822220Z + objdump -TC ./_skbuild/linux-x86_64-3.12/cmake-build/experimental/gen_ai/fbgemm_gpu_experimental_gen_ai.so | grep GLIBCXX_ | sed 's/.*GLIBCXX_\([.0-9]*\).*/GLIBCXX_\1/g' | sort -Vu | cat 2025-05-07T20:03:47.5822944Z 2025-05-07T20:03:47.5927017Z GLIBCXX_3.4 2025-05-07T20:03:47.5928104Z GLIBCXX_3.4.9 2025-05-07T20:03:47.5928776Z GLIBCXX_3.4.11 2025-05-07T20:03:47.5929373Z GLIBCXX_3.4.18 2025-05-07T20:03:47.5929985Z GLIBCXX_3.4.20 2025-05-07T20:03:47.5930563Z GLIBCXX_3.4.21 2025-05-07T20:03:47.5931182Z GLIBCXX_3.4.29 2025-05-07T20:03:47.5931542Z 2025-05-07T20:03:47.5931567Z 2025-05-07T20:03:47.6529479Z + nm -gDC ./_skbuild/linux-x86_64-3.12/cmake-build/experimental/gen_ai/fbgemm_gpu_experimental_gen_ai.so > /tmp/tmp.f2o2GYjnxY.symbols.txt 2025-05-07T20:03:47.6531666Z 2025-05-07T20:03:47.6659801Z 2025-05-07T20:03:47.6941016Z [CHECK] Total Number of symbols: 1843 2025-05-07T20:03:47.6965868Z [CHECK] Number of fbgemm symbols: 619 2025-05-07T20:03:47.6988089Z + nm -gDCu ./_skbuild/linux-x86_64-3.12/cmake-build/experimental/gen_ai/fbgemm_gpu_experimental_gen_ai.so > /tmp/tmp.Yudvkar08D.usymbols.txt 2025-05-07T20:03:47.6988765Z 2025-05-07T20:03:47.7024643Z 2025-05-07T20:03:47.7058141Z [CHECK] Listing out undefined symbols (252 total): 2025-05-07T20:03:47.7074149Z U VTT for std::__cxx11::basic_ostringstream, std::allocator >@GLIBCXX_3.4.21 2025-05-07T20:03:47.7077049Z U VTT for std::__cxx11::basic_stringstream, std::allocator >@GLIBCXX_3.4.21 2025-05-07T20:03:47.7078721Z U _Unwind_Resume@GCC_3.0 2025-05-07T20:03:47.7079325Z U __assert_fail@GLIBC_2.2.5 2025-05-07T20:03:47.7079735Z U __cudaPopCallConfiguration@libcudart.so.12 2025-05-07T20:03:47.7080154Z U __cudaPushCallConfiguration@libcudart.so.12 2025-05-07T20:03:47.7080586Z U __cudaRegisterFatBinary@libcudart.so.12 2025-05-07T20:03:47.7080993Z U __cudaRegisterFatBinaryEnd@libcudart.so.12 2025-05-07T20:03:47.7081420Z U __cudaRegisterFunction@libcudart.so.12 2025-05-07T20:03:47.7081827Z U __cudaRegisterVar@libcudart.so.12 2025-05-07T20:03:47.7082217Z U __cudaUnregisterFatBinary@libcudart.so.12 2025-05-07T20:03:47.7082636Z U __cxa_allocate_exception@CXXABI_1.3 2025-05-07T20:03:47.7082993Z U __cxa_atexit@GLIBC_2.2.5 2025-05-07T20:03:47.7083369Z U __cxa_begin_catch@CXXABI_1.3 2025-05-07T20:03:47.7083710Z U __cxa_end_catch@CXXABI_1.3 2025-05-07T20:03:47.7084091Z U __cxa_free_exception@CXXABI_1.3 2025-05-07T20:03:47.7084468Z U __cxa_guard_abort@CXXABI_1.3 2025-05-07T20:03:47.7084810Z U __cxa_guard_acquire@CXXABI_1.3 2025-05-07T20:03:47.7085190Z U __cxa_guard_release@CXXABI_1.3 2025-05-07T20:03:47.7085524Z U __cxa_rethrow@CXXABI_1.3 2025-05-07T20:03:47.7085894Z U __cxa_thread_atexit@CXXABI_1.3.7 2025-05-07T20:03:47.7086231Z U __cxa_throw@CXXABI_1.3 2025-05-07T20:03:47.7086707Z U __gxx_personality_v0@CXXABI_1.3 2025-05-07T20:03:47.7087039Z U __tls_get_addr@GLIBC_2.3 2025-05-07T20:03:47.7087385Z U __udivti3@GCC_3.0 2025-05-07T20:03:47.7087833Z U __xstat@GLIBC_2.2.5 2025-05-07T20:03:47.7088170Z U at::CUDAGeneratorImpl::device_type() 2025-05-07T20:03:47.7088623Z U at::CUDAGeneratorImpl::philox_cuda_state(unsigned long) 2025-05-07T20:03:47.7089033Z U at::TensorMaker::make_tensor() 2025-05-07T20:03:47.7089519Z U at::_ops::add__Tensor::call(at::Tensor&, at::Tensor const&, c10::Scalar const&) 2025-05-07T20:03:47.7090025Z U at::_ops::div__Scalar::call(at::Tensor&, c10::Scalar const&) 2025-05-07T20:03:47.7091037Z U at::_ops::empty_like::call(at::Tensor const&, std::optional, std::optional, std::optional, std::optional, std::optional) 2025-05-07T20:03:47.7092386Z U at::_ops::empty_memory_format::call(c10::ArrayRef, std::optional, std::optional, std::optional, std::optional, std::optional) 2025-05-07T20:03:47.7093436Z U at::_ops::expand::call(at::Tensor const&, c10::ArrayRef, bool) 2025-05-07T20:03:47.7093958Z U at::_ops::index_select::call(at::Tensor const&, long, at::Tensor const&) 2025-05-07T20:03:47.7094483Z U at::_ops::norm_Scalar::call(at::Tensor const&, c10::Scalar const&) 2025-05-07T20:03:47.7095086Z U at::_ops::scatter_add_::call(at::Tensor&, long, at::Tensor const&, at::Tensor const&) 2025-05-07T20:03:47.7095599Z U at::_ops::select_int::call(at::Tensor const&, long, c10::SymInt) 2025-05-07T20:03:47.7096134Z U at::_ops::split_sizes::call(at::Tensor const&, c10::ArrayRef, long) 2025-05-07T20:03:47.7096808Z U at::_ops::sum_dim_IntList::call(at::Tensor const&, c10::OptionalArrayRef, bool, std::optional) 2025-05-07T20:03:47.7097585Z U at::_ops::to_dtype::call(at::Tensor const&, c10::ScalarType, bool, bool, std::optional) 2025-05-07T20:03:47.7098657Z U at::_ops::to_dtype_layout::call(at::Tensor const&, std::optional, std::optional, std::optional, std::optional, bool, bool, std::optional) 2025-05-07T20:03:47.7099505Z U at::_ops::unsqueeze::call(at::Tensor const&, long) 2025-05-07T20:03:47.7099969Z U at::_ops::view::call(at::Tensor const&, c10::ArrayRef) 2025-05-07T20:03:47.7100744Z U at::_ops::zeros::call(c10::ArrayRef, std::optional, std::optional, std::optional, std::optional) 2025-05-07T20:03:47.7101463Z U at::cuda::detail::getDefaultCUDAGenerator(signed char) 2025-05-07T20:03:47.7101905Z U at::cuda::getCurrentDeviceProperties() 2025-05-07T20:03:47.7102315Z U at::tensor(c10::ArrayRef, c10::TensorOptions const&) 2025-05-07T20:03:47.7102788Z U c10::AutogradMetaInterface::~AutogradMetaInterface() 2025-05-07T20:03:47.7103408Z U c10::BFloat16* at::TensorBase::data_ptr() const 2025-05-07T20:03:47.7104108Z U c10::BFloat16* at::TensorBase::mutable_data_ptr() const 2025-05-07T20:03:47.7104593Z U c10::BoolType::get() 2025-05-07T20:03:47.7105196Z U c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) 2025-05-07T20:03:47.7105834Z U c10::Error::what() const 2025-05-07T20:03:47.7106335Z U c10::Float8_e4m3fn* at::TensorBase::mutable_data_ptr() const 2025-05-07T20:03:47.7106811Z U c10::FloatType::get() 2025-05-07T20:03:47.7107190Z U c10::GeneratorImpl::device() const 2025-05-07T20:03:47.7107556Z U c10::IValue::isTensorList() const 2025-05-07T20:03:47.7108022Z U c10::IValue::reportToTensorTypeError() const 2025-05-07T20:03:47.7108431Z U c10::IntType::get() 2025-05-07T20:03:47.7109364Z U c10::ListType::get(std::__cxx11::basic_string, std::allocator > const&, c10::Type::SingletonOrSharedTypePtr) 2025-05-07T20:03:47.7110176Z U c10::MessageLogger::MessageLogger(char const*, int, int) 2025-05-07T20:03:47.7110614Z U c10::MessageLogger::~MessageLogger() 2025-05-07T20:03:47.7111120Z U c10::OptionalType::get(c10::Type::SingletonOrSharedTypePtr) 2025-05-07T20:03:47.7111623Z U c10::ScalarTypeType::get() 2025-05-07T20:03:47.7112023Z U c10::StorageImpl::throw_data_ptr_access_error() const 2025-05-07T20:03:47.7112446Z U c10::StringType::get() 2025-05-07T20:03:47.7112887Z U c10::SymBool::guard_bool(char const*, long) const 2025-05-07T20:03:47.7113346Z U c10::SymFloat::guard_float(char const*, long) const 2025-05-07T20:03:47.7114062Z U c10::SymInt::SymInt(c10::intrusive_ptr >) 2025-05-07T20:03:47.7114736Z U c10::SymInt::guard_int(char const*, long) const 2025-05-07T20:03:47.7115152Z U c10::SymInt::toSymNode() const 2025-05-07T20:03:47.7116795Z U c10::SymbolicShapeMeta::init_is_contiguous() const 2025-05-07T20:03:47.7117554Z U c10::TensorImpl::set_autograd_meta(std::unique_ptr >) 2025-05-07T20:03:47.7118318Z U c10::TensorImpl::throw_data_ptr_access_error() const 2025-05-07T20:03:47.7118712Z U c10::TensorType::get() 2025-05-07T20:03:47.7119095Z U c10::UndefinedTensorImpl::_singleton 2025-05-07T20:03:47.7120129Z U c10::Warning::Warning(std::variant, c10::SourceLocation const&, std::__cxx11::basic_string, std::allocator >, bool) 2025-05-07T20:03:47.7121235Z U c10::cuda::CUDACachingAllocator::allocator 2025-05-07T20:03:47.7121633Z U c10::cuda::CUDAStream::stream() const 2025-05-07T20:03:47.7121986Z U c10::cuda::ExchangeDevice(signed char) 2025-05-07T20:03:47.7122366Z U c10::cuda::GetDevice(signed char*) 2025-05-07T20:03:47.7122719Z U c10::cuda::MaybeSetDevice(signed char) 2025-05-07T20:03:47.7123102Z U c10::cuda::SetDevice(signed char) 2025-05-07T20:03:47.7123610Z U c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) 2025-05-07T20:03:47.7124071Z U c10::cuda::current_device() 2025-05-07T20:03:47.7124417Z U c10::cuda::device_count() 2025-05-07T20:03:47.7124761Z U c10::cuda::getCurrentCUDAStream(signed char) 2025-05-07T20:03:47.7125174Z U c10::cuda::getDefaultCUDAStream(signed char) 2025-05-07T20:03:47.7125564Z U c10::cuda::getStreamFromPool(bool, signed char) 2025-05-07T20:03:47.7125981Z U c10::cuda::getStreamFromPool(int, signed char) 2025-05-07T20:03:47.7126418Z U c10::cuda::setCurrentCUDAStream(c10::cuda::CUDAStream) 2025-05-07T20:03:47.7126806Z U c10::cuda::warn_or_error_on_sync() 2025-05-07T20:03:47.7127464Z U c10::detail::ListImpl::ListImpl(std::vector >, c10::Type::SingletonOrSharedTypePtr) 2025-05-07T20:03:47.7128460Z U c10::detail::infer_schema::make_function_schema(c10::ArrayRef, c10::ArrayRef) 2025-05-07T20:03:47.7129328Z U c10::detail::torchCheckFail(char const*, char const*, unsigned int, char const*) 2025-05-07T20:03:47.7130423Z U c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string, std::allocator > const&) 2025-05-07T20:03:47.7131551Z U c10::detail::torchInternalAssertFail(char const*, char const*, unsigned int, char const*, char const*) 2025-05-07T20:03:47.7132666Z U c10::detail::torchInternalAssertFail(char const*, char const*, unsigned int, char const*, std::__cxx11::basic_string, std::allocator > const&) 2025-05-07T20:03:47.7133482Z U c10::get_default_dtype() 2025-05-07T20:03:47.7134014Z U c10::impl::ExcludeDispatchKeyGuard::ExcludeDispatchKeyGuard(c10::DispatchKeySet) 2025-05-07T20:03:47.7134649Z U c10::impl::ExcludeDispatchKeyGuard::~ExcludeDispatchKeyGuard() 2025-05-07T20:03:47.7135150Z U c10::impl::GPUTrace::gpuTraceState 2025-05-07T20:03:47.7135543Z U c10::impl::GPUTrace::haveState 2025-05-07T20:03:47.7135941Z U c10::impl::cow::is_cow_data_ptr(c10::DataPtr const&) 2025-05-07T20:03:47.7136427Z U c10::impl::cow::materialize_cow_storage(c10::StorageImpl&) 2025-05-07T20:03:47.7136859Z U c10::impl::device_guard_impl_registry 2025-05-07T20:03:47.7137273Z U c10::operator*(c10::SymInt const&, int) 2025-05-07T20:03:47.7137712Z U c10::operator-(c10::SymInt const&, int) 2025-05-07T20:03:47.7138088Z U c10::operator-(c10::SymInt const&, long) 2025-05-07T20:03:47.7138521Z U c10::operator<<(std::ostream&, c10::Device const&) 2025-05-07T20:03:47.7138929Z U c10::operator<<(std::ostream&, c10::DeviceType) 2025-05-07T20:03:47.7139311Z U c10::throwNullDataPtrError() 2025-05-07T20:03:47.7139663Z U c10::warn(c10::Warning const&) 2025-05-07T20:03:47.7139993Z U c10::warnDeprecatedDataPtr() 2025-05-07T20:03:47.7140775Z U c10d::getNcclErrorDetailStr(ncclResult_t, std::optional, std::allocator > >) 2025-05-07T20:03:47.7141531Z U c10d::ncclGetErrorWithVersion[abi:cxx11](ncclResult_t) 2025-05-07T20:03:47.7142026Z U caffe2::TypeMeta::error_unsupported_typemeta(caffe2::TypeMeta) 2025-05-07T20:03:47.7142484Z U caffe2::TypeMeta::typeMetaDatas() 2025-05-07T20:03:47.7142816Z U cublasLtCreate 2025-05-07T20:03:47.7143223Z U cublasLtMatmul 2025-05-07T20:03:47.7143556Z U cublasLtMatmulAlgoGetHeuristic 2025-05-07T20:03:47.7143920Z U cublasLtMatmulDescCreate 2025-05-07T20:03:47.7144273Z U cublasLtMatmulDescSetAttribute 2025-05-07T20:03:47.7144646Z U cublasLtMatmulPreferenceCreate 2025-05-07T20:03:47.7145011Z U cublasLtMatmulPreferenceSetAttribute 2025-05-07T20:03:47.7145393Z U cublasLtMatrixLayoutCreate 2025-05-07T20:03:47.7145763Z U cudaDeviceGetAttribute@libcudart.so.12 2025-05-07T20:03:47.7146132Z U cudaDeviceSynchronize@libcudart.so.12 2025-05-07T20:03:47.7146523Z U cudaEventCreateWithFlags@libcudart.so.12 2025-05-07T20:03:47.7146888Z U cudaEventDestroy@libcudart.so.12 2025-05-07T20:03:47.7147263Z U cudaEventElapsedTime@libcudart.so.12 2025-05-07T20:03:47.7147613Z U cudaEventQuery@libcudart.so.12 2025-05-07T20:03:47.7147973Z U cudaEventRecord@libcudart.so.12 2025-05-07T20:03:47.7148329Z U cudaEventSynchronize@libcudart.so.12 2025-05-07T20:03:47.7148689Z U cudaFree@libcudart.so.12 2025-05-07T20:03:47.7149051Z U cudaFuncSetAttribute@libcudart.so.12 2025-05-07T20:03:47.7149400Z U cudaGetDevice@libcudart.so.12 2025-05-07T20:03:47.7149824Z U cudaGetDeviceProperties_v2@libcudart.so.12 2025-05-07T20:03:47.7150210Z U cudaGetDriverEntryPoint@libcudart.so.12 2025-05-07T20:03:47.7150585Z U cudaGetErrorName@libcudart.so.12 2025-05-07T20:03:47.7150934Z U cudaGetErrorString@libcudart.so.12 2025-05-07T20:03:47.7151299Z U cudaGetLastError@libcudart.so.12 2025-05-07T20:03:47.7151672Z U cudaIpcGetMemHandle@libcudart.so.12 2025-05-07T20:03:47.7152030Z U cudaIpcOpenMemHandle@libcudart.so.12 2025-05-07T20:03:47.7152424Z U cudaLaunchCooperativeKernel@libcudart.so.12 2025-05-07T20:03:47.7152797Z U cudaLaunchKernel@libcudart.so.12 2025-05-07T20:03:47.7153161Z U cudaLaunchKernelExC@libcudart.so.12 2025-05-07T20:03:47.7153505Z U cudaMalloc@libcudart.so.12 2025-05-07T20:03:47.7153908Z U cudaMemcpy@libcudart.so.12 2025-05-07T20:03:47.7154287Z U cudaMemcpyAsync@libcudart.so.12 2025-05-07T20:03:47.7154653Z U cudaMemsetAsync@libcudart.so.12 2025-05-07T20:03:47.7155045Z U cudaStreamQuery@libcudart.so.12 2025-05-07T20:03:47.7155419Z U cudaStreamSynchronize@libcudart.so.12 2025-05-07T20:03:47.7155959Z U cudaStreamWaitEvent@libcudart.so.12 2025-05-07T20:03:47.7156322Z U exit@GLIBC_2.2.5 2025-05-07T20:03:47.7156642Z U fclose@GLIBC_2.2.5 2025-05-07T20:03:47.7156958Z U fflush@GLIBC_2.2.5 2025-05-07T20:03:47.7157293Z U float* at::TensorBase::data_ptr() const 2025-05-07T20:03:47.7157725Z U float* at::TensorBase::mutable_data_ptr() const 2025-05-07T20:03:47.7158081Z U fopen@GLIBC_2.2.5 2025-05-07T20:03:47.7158396Z U fprintf@GLIBC_2.2.5 2025-05-07T20:03:47.7158718Z U fread@GLIBC_2.2.5 2025-05-07T20:03:47.7159007Z U fwrite@GLIBC_2.2.5 2025-05-07T20:03:47.7159349Z U int* at::TensorBase::data_ptr() const 2025-05-07T20:03:47.7159755Z U int* at::TensorBase::mutable_data_ptr() const 2025-05-07T20:03:47.7160213Z U long c10::detail::maybe_wrap_dim_slow(long, long, bool) 2025-05-07T20:03:47.7160635Z U long* at::TensorBase::data_ptr() const 2025-05-07T20:03:47.7161018Z U memcmp@GLIBC_2.2.5 2025-05-07T20:03:47.7161334Z U memcpy@GLIBC_2.14 2025-05-07T20:03:47.7161633Z U memmove@GLIBC_2.2.5 2025-05-07T20:03:47.7161958Z U memset@GLIBC_2.2.5 2025-05-07T20:03:47.7162242Z U ncclAllGather 2025-05-07T20:03:47.7162547Z U ncclAllReduce 2025-05-07T20:03:47.7162830Z U ncclCommInitRank 2025-05-07T20:03:47.7163145Z U ncclGetUniqueId 2025-05-07T20:03:47.7163434Z U ncclReduceScatter 2025-05-07T20:03:47.7163800Z U operator delete(void*, unsigned long)@CXXABI_1.3.9 2025-05-07T20:03:47.7164179Z U operator new(unsigned long)@GLIBCXX_3.4 2025-05-07T20:03:47.7164528Z U printf@GLIBC_2.2.5 2025-05-07T20:03:47.7164905Z U signed char* at::TensorBase::data_ptr() const 2025-05-07T20:03:47.7165372Z U signed char* at::TensorBase::mutable_data_ptr() const 2025-05-07T20:03:47.7166013Z U std::__cxx11::basic_ostringstream, std::allocator >::basic_ostringstream() 2025-05-07T20:03:47.7166827Z U std::__cxx11::basic_ostringstream, std::allocator >::~basic_ostringstream()@GLIBCXX_3.4.21 2025-05-07T20:03:47.7167617Z U std::__cxx11::basic_stringbuf, std::allocator >::str() const &@GLIBCXX_3.4.29 2025-05-07T20:03:47.7168425Z U std::__detail::_Prime_rehash_policy::_M_need_rehash(unsigned long, unsigned long, unsigned long) const@GLIBCXX_3.4.18 2025-05-07T20:03:47.7169039Z U std::__throw_bad_alloc()@GLIBCXX_3.4 2025-05-07T20:03:47.7169401Z U std::__throw_bad_array_new_length() 2025-05-07T20:03:47.7169765Z U std::__throw_bad_cast()@GLIBCXX_3.4 2025-05-07T20:03:47.7170124Z U std::__throw_length_error(char const*)@GLIBCXX_3.4 2025-05-07T20:03:47.7170552Z U std::__throw_logic_error(char const*)@GLIBCXX_3.4 2025-05-07T20:03:47.7170970Z U std::__throw_out_of_range_fmt(char const*, ...)@GLIBCXX_3.4.20 2025-05-07T20:03:47.7171414Z U std::__throw_system_error(int)@GLIBCXX_3.4.11 2025-05-07T20:03:47.7171912Z U std::basic_ios >::clear(std::_Ios_Iostate)@GLIBCXX_3.4 2025-05-07T20:03:47.7172615Z U std::basic_ios >::init(std::basic_streambuf >*)@GLIBCXX_3.4 2025-05-07T20:03:47.7173628Z U std::basic_ostream >& std::__ostream_insert >(std::basic_ostream >&, char const*, long)@GLIBCXX_3.4.9 2025-05-07T20:03:47.7174785Z U std::basic_ostream >& std::operator<< >(std::basic_ostream >&, char const*)@GLIBCXX_3.4 2025-05-07T20:03:47.7175514Z U std::cerr@GLIBCXX_3.4 2025-05-07T20:03:47.7175845Z U std::cout@GLIBCXX_3.4 2025-05-07T20:03:47.7176204Z U std::ctype::_M_widen_init() const@GLIBCXX_3.4.11 2025-05-07T20:03:47.7176636Z U std::exception::what() const@GLIBCXX_3.4 2025-05-07T20:03:47.7177024Z U std::exception::~exception()@GLIBCXX_3.4 2025-05-07T20:03:47.7177379Z U std::ios_base::Init::Init()@GLIBCXX_3.4 2025-05-07T20:03:47.7177756Z U std::ios_base::Init::~Init()@GLIBCXX_3.4 2025-05-07T20:03:47.7178102Z U std::ios_base::ios_base()@GLIBCXX_3.4 2025-05-07T20:03:47.7178501Z U std::ios_base::~ios_base()@GLIBCXX_3.4 2025-05-07T20:03:47.7178845Z U std::locale::locale()@GLIBCXX_3.4 2025-05-07T20:03:47.7179204Z U std::locale::~locale()@GLIBCXX_3.4 2025-05-07T20:03:47.7179618Z U std::logic_error::logic_error(char const*)@GLIBCXX_3.4.21 2025-05-07T20:03:47.7180018Z U std::logic_error::~logic_error()@GLIBCXX_3.4 2025-05-07T20:03:47.7180472Z U std::ostream& std::ostream::_M_insert(long)@GLIBCXX_3.4.9 2025-05-07T20:03:47.7180991Z U std::ostream& std::ostream::_M_insert(unsigned long)@GLIBCXX_3.4.9 2025-05-07T20:03:47.7181575Z U std::ostream& std::ostream::_M_insert(void const*)@GLIBCXX_3.4.9 2025-05-07T20:03:47.7182047Z U std::ostream::flush()@GLIBCXX_3.4 2025-05-07T20:03:47.7182395Z U std::ostream::operator<<(int)@GLIBCXX_3.4 2025-05-07T20:03:47.7182777Z U std::ostream::put(char)@GLIBCXX_3.4 2025-05-07T20:03:47.7183271Z U std::runtime_error::runtime_error(char const*)@GLIBCXX_3.4.21 2025-05-07T20:03:47.7184205Z U std::runtime_error::runtime_error(std::__cxx11::basic_string, std::allocator > const&)@GLIBCXX_3.4.21 2025-05-07T20:03:47.7184936Z U std::runtime_error::~runtime_error()@GLIBCXX_3.4 2025-05-07T20:03:47.7185318Z U std::terminate()@GLIBCXX_3.4 2025-05-07T20:03:47.7185684Z U stderr@GLIBC_2.2.5 2025-05-07T20:03:47.7185998Z U strlen@GLIBC_2.2.5 2025-05-07T20:03:47.7186372Z U torch::CppFunction::~CppFunction() 2025-05-07T20:03:47.7187210Z U torch::Library::Library(torch::Library::Kind, std::__cxx11::basic_string, std::allocator >, std::optional, char const*, unsigned int) 2025-05-07T20:03:47.7188653Z U torch::Library::_def(c10::FunctionSchema&&, c10::OperatorName*, std::vector > const&, torch::_RegisterOrVerify) & 2025-05-07T20:03:47.7189521Z U torch::Library::_impl(char const*, torch::CppFunction&&, torch::_RegisterOrVerify) & 2025-05-07T20:03:47.7190409Z U torch::cuda::nccl::all2all(std::vector >&, std::vector >&, void*, c10::cuda::CUDAStream&) 2025-05-07T20:03:47.7191377Z U torch::cuda::nccl::all2all_single_equal_split(at::Tensor&, at::Tensor&, int, void*, c10::cuda::CUDAStream&) 2025-05-07T20:03:47.7192212Z U torch::jit::parseSchema(std::__cxx11::basic_string, std::allocator > const&, bool) 2025-05-07T20:03:47.7192846Z U typeinfo for c10::Error 2025-05-07T20:03:47.7193228Z U typeinfo for std::exception@GLIBCXX_3.4 2025-05-07T20:03:47.7193645Z U typeinfo for std::logic_error@GLIBCXX_3.4 2025-05-07T20:03:47.7194037Z U typeinfo for std::runtime_error@GLIBCXX_3.4 2025-05-07T20:03:47.7194536Z U unsigned char* at::TensorBase::mutable_data_ptr() const 2025-05-07T20:03:47.7194984Z U usleep@GLIBC_2.2.5 2025-05-07T20:03:47.7195408Z U vtable for __cxxabiv1::__class_type_info@CXXABI_1.3 2025-05-07T20:03:47.7195976Z U vtable for __cxxabiv1::__function_type_info@CXXABI_1.3 2025-05-07T20:03:47.7196429Z U vtable for __cxxabiv1::__si_class_type_info@CXXABI_1.3 2025-05-07T20:03:47.7196835Z U vtable for c10::Error 2025-05-07T20:03:47.7197381Z U vtable for std::__cxx11::basic_ostringstream, std::allocator >@GLIBCXX_3.4.21 2025-05-07T20:03:47.7198174Z U vtable for std::__cxx11::basic_stringbuf, std::allocator >@GLIBCXX_3.4.21 2025-05-07T20:03:47.7198978Z U vtable for std::__cxx11::basic_stringstream, std::allocator >@GLIBCXX_3.4.21 2025-05-07T20:03:47.7199606Z U vtable for std::basic_ios >@GLIBCXX_3.4 2025-05-07T20:03:47.7200161Z U vtable for std::basic_streambuf >@GLIBCXX_3.4 2025-05-07T20:03:47.7200617Z U vtable for torch::autograd::AutogradMeta 2025-05-07T20:03:47.7201001Z w _ITM_deregisterTMCloneTable 2025-05-07T20:03:47.7201354Z w _ITM_registerTMCloneTable 2025-05-07T20:03:47.7201672Z w __cxa_finalize@GLIBC_2.2.5 2025-05-07T20:03:47.7202009Z w __gmon_start__ 2025-05-07T20:03:47.7202289Z w __pthread_key_create 2025-05-07T20:03:47.7202631Z w pthread_mutex_lock@GLIBC_2.2.5 2025-05-07T20:03:47.7202964Z w pthread_mutex_unlock@GLIBC_2.2.5 2025-05-07T20:03:47.7203353Z [CHECK] Listing out external shared libraries linked: 2025-05-07T20:03:47.7203913Z + ldd ./_skbuild/linux-x86_64-3.12/cmake-build/experimental/gen_ai/fbgemm_gpu_experimental_gen_ai.so 2025-05-07T20:03:47.7204360Z 2025-05-07T20:03:47.7204500Z linux-vdso.so.1 (0x00007ffee25f2000) 2025-05-07T20:03:47.7204837Z libtorch.so => not found 2025-05-07T20:03:47.7205095Z libc10.so => not found 2025-05-07T20:03:47.7205379Z libc10_cuda.so => not found 2025-05-07T20:03:47.7205646Z libnccl.so.2 => not found 2025-05-07T20:03:47.7205943Z libtorch_cpu.so => not found 2025-05-07T20:03:47.7206212Z libtorch_cuda.so => not found 2025-05-07T20:03:47.7206517Z libcudart.so.12 => not found 2025-05-07T20:03:47.7206856Z libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007fa32859c000) 2025-05-07T20:03:47.7207301Z libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007fa32e4b4000) 2025-05-07T20:03:47.7207698Z libc.so.6 => /lib64/libc.so.6 (0x00007fa328394000) 2025-05-07T20:03:47.7208097Z /lib64/ld-linux-x86-64.so.2 (0x00007fa32e4e8000) 2025-05-07T20:03:47.7208484Z libm.so.6 => /lib64/libm.so.6 (0x00007fa3282b9000) 2025-05-07T20:03:47.7208869Z 2025-05-07T20:03:47.7209158Z [CHECK] Displaying ELF information: 2025-05-07T20:03:47.7209821Z + readelf -d ./_skbuild/linux-x86_64-3.12/cmake-build/experimental/gen_ai/fbgemm_gpu_experimental_gen_ai.so 2025-05-07T20:03:47.7210293Z 2025-05-07T20:03:47.7292812Z 2025-05-07T20:03:47.7293482Z Dynamic section at offset 0x5a661e0 contains 38 entries: 2025-05-07T20:03:47.7293917Z Tag Type Name/Value 2025-05-07T20:03:47.7294422Z 0x0000000000000001 (NEEDED) Shared library: [libtorch.so] 2025-05-07T20:03:47.7294986Z 0x0000000000000001 (NEEDED) Shared library: [libc10.so] 2025-05-07T20:03:47.7295501Z 0x0000000000000001 (NEEDED) Shared library: [libc10_cuda.so] 2025-05-07T20:03:47.7296232Z 0x0000000000000001 (NEEDED) Shared library: [libnccl.so.2] 2025-05-07T20:03:47.7296775Z 0x0000000000000001 (NEEDED) Shared library: [libtorch_cpu.so] 2025-05-07T20:03:47.7297444Z 0x0000000000000001 (NEEDED) Shared library: [libtorch_cuda.so] 2025-05-07T20:03:47.7298068Z 0x0000000000000001 (NEEDED) Shared library: [libcudart.so.12] 2025-05-07T20:03:47.7298588Z 0x0000000000000001 (NEEDED) Shared library: [libstdc++.so.6] 2025-05-07T20:03:47.7299212Z 0x0000000000000001 (NEEDED) Shared library: [libgcc_s.so.1] 2025-05-07T20:03:47.7299686Z 0x0000000000000001 (NEEDED) Shared library: [libc.so.6] 2025-05-07T20:03:47.7300208Z 0x0000000000000001 (NEEDED) Shared library: [ld-linux-x86-64.so.2] 2025-05-07T20:03:47.7300759Z 0x000000000000000e (SONAME) Library soname: [fbgemm_gpu_experimental_gen_ai.so] 2025-05-07T20:03:47.7301246Z 0x000000000000000c (INIT) 0x59000 2025-05-07T20:03:47.7301579Z 0x000000000000000d (FINI) 0x4a1fac 2025-05-07T20:03:47.7301948Z 0x0000000000000019 (INIT_ARRAY) 0x5a658a0 2025-05-07T20:03:47.7302379Z 0x000000000000001b (INIT_ARRAYSZ) 1136 (bytes) 2025-05-07T20:03:47.7302737Z 0x000000000000001a (FINI_ARRAY) 0x5a65d10 2025-05-07T20:03:47.7303110Z 0x000000000000001c (FINI_ARRAYSZ) 8 (bytes) 2025-05-07T20:03:47.7303754Z 0x0000000000000004 (HASH) 0x238 2025-05-07T20:03:47.7304141Z 0x000000006ffffef5 (GNU_HASH) 0x2ef0 2025-05-07T20:03:47.7304501Z 0x0000000000000005 (STRTAB) 0x10c88 2025-05-07T20:03:47.7304880Z 0x0000000000000006 (SYMTAB) 0x5fa8 2025-05-07T20:03:47.7305248Z 0x000000000000000a (STRSZ) 229126 (bytes) 2025-05-07T20:03:47.7305661Z 0x000000000000000b (SYMENT) 24 (bytes) 2025-05-07T20:03:47.7306060Z 0x0000000000000003 (PLTGOT) 0x5a67490 2025-05-07T20:03:47.7306451Z 0x0000000000000002 (PLTRELSZ) 19224 (bytes) 2025-05-07T20:03:47.7306859Z 0x0000000000000014 (PLTREL) RELA 2025-05-07T20:03:47.7307209Z 0x0000000000000017 (JMPREL) 0x53750 2025-05-07T20:03:47.7307585Z 0x0000000000000007 (RELA) 0x49b38 2025-05-07T20:03:47.7307956Z 0x0000000000000008 (RELASZ) 39960 (bytes) 2025-05-07T20:03:47.7308359Z 0x0000000000000009 (RELAENT) 24 (bytes) 2025-05-07T20:03:47.7308913Z 0x0000000000000018 (BIND_NOW) 2025-05-07T20:03:47.7309301Z 0x000000006ffffffb (FLAGS_1) Flags: NOW 2025-05-07T20:03:47.7309713Z 0x000000006ffffffe (VERNEED) 0x499f8 2025-05-07T20:03:47.7310069Z 0x000000006fffffff (VERNEEDNUM) 5 2025-05-07T20:03:47.7310452Z 0x000000006ffffff0 (VERSYM) 0x48b8e 2025-05-07T20:03:47.7310807Z 0x000000006ffffff9 (RELACOUNT) 215 2025-05-07T20:03:47.7311169Z 0x0000000000000000 (NULL) 0x0 2025-05-07T20:03:47.7311383Z 2025-05-07T20:03:47.7311513Z ################################################################################ 2025-05-07T20:03:47.7311855Z 2025-05-07T20:03:47.7311859Z 2025-05-07T20:03:47.7311994Z ################################################################################ 2025-05-07T20:03:47.7312704Z [CHECK] BUILT LIBRARY: ./_skbuild/linux-x86_64-3.12/cmake-build/experimental/example/fbgemm_gpu_experimental_example_py.so 2025-05-07T20:03:47.7313361Z [CHECK] Listing out library size: 2025-05-07T20:03:47.7314009Z + du -h --block-size=1M ./_skbuild/linux-x86_64-3.12/cmake-build/experimental/example/fbgemm_gpu_experimental_example_py.so 2025-05-07T20:03:47.7314536Z 2025-05-07T20:03:47.7314934Z 1 ./_skbuild/linux-x86_64-3.12/cmake-build/experimental/example/fbgemm_gpu_experimental_example_py.so 2025-05-07T20:03:47.7315398Z 2025-05-07T20:03:47.7316122Z [CHECK] Listing out the GLIBC versions referenced by: ./_skbuild/linux-x86_64-3.12/cmake-build/experimental/example/fbgemm_gpu_experimental_example_py.so 2025-05-07T20:03:47.7317513Z + objdump -TC ./_skbuild/linux-x86_64-3.12/cmake-build/experimental/example/fbgemm_gpu_experimental_example_py.so | grep GLIBC_ | sed 's/.*GLIBC_\([.0-9]*\).*/GLIBC_\1/g' | sort -Vu | cat 2025-05-07T20:03:47.7318276Z 2025-05-07T20:03:47.7371654Z GLIBC_2.2.5 2025-05-07T20:03:47.7372319Z GLIBC_2.14 2025-05-07T20:03:47.7372667Z 2025-05-07T20:03:47.7372728Z 2025-05-07T20:03:47.7374311Z [CHECK] Listing out the GLIBCXX versions referenced by: ./_skbuild/linux-x86_64-3.12/cmake-build/experimental/example/fbgemm_gpu_experimental_example_py.so 2025-05-07T20:03:47.7375875Z + objdump -TC ./_skbuild/linux-x86_64-3.12/cmake-build/experimental/example/fbgemm_gpu_experimental_example_py.so | grep GLIBCXX_ | sed 's/.*GLIBCXX_\([.0-9]*\).*/GLIBCXX_\1/g' | sort -Vu | cat 2025-05-07T20:03:47.7376750Z 2025-05-07T20:03:47.7426026Z GLIBCXX_3.4 2025-05-07T20:03:47.7426722Z GLIBCXX_3.4.9 2025-05-07T20:03:47.7427322Z GLIBCXX_3.4.21 2025-05-07T20:03:47.7427732Z 2025-05-07T20:03:47.7427746Z 2025-05-07T20:03:47.7450848Z + nm -gDC ./_skbuild/linux-x86_64-3.12/cmake-build/experimental/example/fbgemm_gpu_experimental_example_py.so > /tmp/tmp.agS1JldsQz.symbols.txt 2025-05-07T20:03:47.7451906Z 2025-05-07T20:03:47.7466878Z 2025-05-07T20:03:47.7497092Z [CHECK] Total Number of symbols: 155 2025-05-07T20:03:47.7518824Z [CHECK] Number of fbgemm symbols: 19 2025-05-07T20:03:47.7541401Z + nm -gDCu ./_skbuild/linux-x86_64-3.12/cmake-build/experimental/example/fbgemm_gpu_experimental_example_py.so > /tmp/tmp.ZeAuXxZfCW.usymbols.txt 2025-05-07T20:03:47.7542108Z 2025-05-07T20:03:47.7563338Z 2025-05-07T20:03:47.7598527Z [CHECK] Listing out undefined symbols (76 total): 2025-05-07T20:03:47.7620037Z U VTT for std::__cxx11::basic_ostringstream, std::allocator >@GLIBCXX_3.4.21 2025-05-07T20:03:47.7621764Z U _Unwind_Resume@GCC_3.0 2025-05-07T20:03:47.7622545Z U __cudaPopCallConfiguration@libcudart.so.12 2025-05-07T20:03:47.7622959Z U __cudaPushCallConfiguration@libcudart.so.12 2025-05-07T20:03:47.7623514Z U __cudaRegisterFatBinary@libcudart.so.12 2025-05-07T20:03:47.7623960Z U __cudaRegisterFatBinaryEnd@libcudart.so.12 2025-05-07T20:03:47.7624372Z U __cudaRegisterFunction@libcudart.so.12 2025-05-07T20:03:47.7624776Z U __cudaRegisterVar@libcudart.so.12 2025-05-07T20:03:47.7625174Z U __cudaUnregisterFatBinary@libcudart.so.12 2025-05-07T20:03:47.7625587Z U __cxa_allocate_exception@CXXABI_1.3 2025-05-07T20:03:47.7625951Z U __cxa_atexit@GLIBC_2.2.5 2025-05-07T20:03:47.7626308Z U __cxa_free_exception@CXXABI_1.3 2025-05-07T20:03:47.7626672Z U __cxa_throw@CXXABI_1.3 2025-05-07T20:03:47.7627004Z U __gxx_personality_v0@CXXABI_1.3 2025-05-07T20:03:47.7627517Z U at::_ops::add_Tensor::call(at::Tensor const&, at::Tensor const&, c10::Scalar const&) 2025-05-07T20:03:47.7628198Z U at::_ops::to_dtype::call(at::Tensor const&, c10::ScalarType, bool, bool, std::optional) 2025-05-07T20:03:47.7629331Z U at::_ops::zeros::call(c10::ArrayRef, std::optional, std::optional, std::optional, std::optional) 2025-05-07T20:03:47.7630077Z U c10::FloatType::get() 2025-05-07T20:03:47.7630449Z U c10::IValue::reportToTensorTypeError() const 2025-05-07T20:03:47.7630930Z U c10::MessageLogger::MessageLogger(char const*, int, int) 2025-05-07T20:03:47.7631361Z U c10::MessageLogger::~MessageLogger() 2025-05-07T20:03:47.7631786Z U c10::SymFloat::guard_float(char const*, long) const 2025-05-07T20:03:47.7632195Z U c10::TensorType::get() 2025-05-07T20:03:47.7632538Z U c10::UndefinedTensorImpl::_singleton 2025-05-07T20:03:47.7633381Z U c10::detail::infer_schema::make_function_schema(c10::ArrayRef, c10::ArrayRef) 2025-05-07T20:03:47.7634278Z U c10::detail::torchCheckFail(char const*, char const*, unsigned int, char const*) 2025-05-07T20:03:47.7635027Z U c10::detail::torchInternalAssertFail(char const*, char const*, unsigned int, char const*, char const*) 2025-05-07T20:03:47.7635710Z U caffe2::TypeMeta::error_unsupported_typemeta(caffe2::TypeMeta) 2025-05-07T20:03:47.7636212Z U cudaGetErrorString@libcudart.so.12 2025-05-07T20:03:47.7636617Z U cudaGetLastError@libcudart.so.12 2025-05-07T20:03:47.7636988Z U cudaLaunchKernel@libcudart.so.12 2025-05-07T20:03:47.7637581Z U float* at::TensorBase::data_ptr() const 2025-05-07T20:03:47.7638174Z U long c10::detail::maybe_wrap_dim_slow(long, long, bool) 2025-05-07T20:03:47.7638578Z U memcpy@GLIBC_2.14 2025-05-07T20:03:47.7638916Z U memmove@GLIBC_2.2.5 2025-05-07T20:03:47.7639230Z U memset@GLIBC_2.2.5 2025-05-07T20:03:47.7639682Z U ncclCommDestroy 2025-05-07T20:03:47.7640080Z U ncclCommInitAll 2025-05-07T20:03:47.7640436Z U operator delete(void*, unsigned long)@CXXABI_1.3.9 2025-05-07T20:03:47.7640814Z U operator new(unsigned long)@GLIBCXX_3.4 2025-05-07T20:03:47.7641441Z U std::__cxx11::basic_ostringstream, std::allocator >::~basic_ostringstream()@GLIBCXX_3.4.21 2025-05-07T20:03:47.7642055Z U std::__throw_bad_alloc()@GLIBCXX_3.4 2025-05-07T20:03:47.7642411Z U std::__throw_length_error(char const*)@GLIBCXX_3.4 2025-05-07T20:03:47.7642817Z U std::__throw_logic_error(char const*)@GLIBCXX_3.4 2025-05-07T20:03:47.7643288Z U std::basic_ios >::clear(std::_Ios_Iostate)@GLIBCXX_3.4 2025-05-07T20:03:47.7643979Z U std::basic_ios >::init(std::basic_streambuf >*)@GLIBCXX_3.4 2025-05-07T20:03:47.7644982Z U std::basic_ostream >& std::__ostream_insert >(std::basic_ostream >&, char const*, long)@GLIBCXX_3.4.9 2025-05-07T20:03:47.7645751Z U std::ios_base::Init::Init()@GLIBCXX_3.4 2025-05-07T20:03:47.7646118Z U std::ios_base::Init::~Init()@GLIBCXX_3.4 2025-05-07T20:03:47.7646485Z U std::ios_base::ios_base()@GLIBCXX_3.4 2025-05-07T20:03:47.7646825Z U std::ios_base::~ios_base()@GLIBCXX_3.4 2025-05-07T20:03:47.7647189Z U std::locale::locale()@GLIBCXX_3.4 2025-05-07T20:03:47.7647523Z U std::locale::~locale()@GLIBCXX_3.4 2025-05-07T20:03:47.7647949Z U std::ostream& std::ostream::_M_insert(long)@GLIBCXX_3.4.9 2025-05-07T20:03:47.7648369Z U std::ostream::operator<<(int)@GLIBCXX_3.4 2025-05-07T20:03:47.7649057Z U std::runtime_error::runtime_error(std::__cxx11::basic_string, std::allocator > const&)@GLIBCXX_3.4.21 2025-05-07T20:03:47.7649733Z U std::runtime_error::~runtime_error()@GLIBCXX_3.4 2025-05-07T20:03:47.7650084Z U strlen@GLIBC_2.2.5 2025-05-07T20:03:47.7650419Z U torch::CppFunction::~CppFunction() 2025-05-07T20:03:47.7651198Z U torch::Library::Library(torch::Library::Kind, std::__cxx11::basic_string, std::allocator >, std::optional, char const*, unsigned int) 2025-05-07T20:03:47.7652317Z U torch::Library::_def(c10::FunctionSchema&&, c10::OperatorName*, std::vector > const&, torch::_RegisterOrVerify) & 2025-05-07T20:03:47.7653153Z U torch::Library::_impl(char const*, torch::CppFunction&&, torch::_RegisterOrVerify) & 2025-05-07T20:03:47.7653850Z U torch::jit::parseSchema(std::__cxx11::basic_string, std::allocator > const&, bool) 2025-05-07T20:03:47.7654460Z U typeinfo for std::runtime_error@GLIBCXX_3.4 2025-05-07T20:03:47.7654882Z U vtable for __cxxabiv1::__class_type_info@CXXABI_1.3 2025-05-07T20:03:47.7655296Z U vtable for __cxxabiv1::__function_type_info@CXXABI_1.3 2025-05-07T20:03:47.7655766Z U vtable for __cxxabiv1::__si_class_type_info@CXXABI_1.3 2025-05-07T20:03:47.7656359Z U vtable for std::__cxx11::basic_ostringstream, std::allocator >@GLIBCXX_3.4.21 2025-05-07T20:03:47.7657138Z U vtable for std::__cxx11::basic_stringbuf, std::allocator >@GLIBCXX_3.4.21 2025-05-07T20:03:47.7657778Z U vtable for std::basic_ios >@GLIBCXX_3.4 2025-05-07T20:03:47.7658297Z U vtable for std::basic_streambuf >@GLIBCXX_3.4 2025-05-07T20:03:47.7658786Z w _ITM_deregisterTMCloneTable 2025-05-07T20:03:47.7659135Z w _ITM_registerTMCloneTable 2025-05-07T20:03:47.7659448Z w __cxa_finalize@GLIBC_2.2.5 2025-05-07T20:03:47.7659773Z w __gmon_start__ 2025-05-07T20:03:47.7660052Z w __pthread_key_create 2025-05-07T20:03:47.7660425Z [CHECK] Listing out external shared libraries linked: 2025-05-07T20:03:47.7661008Z + ldd ./_skbuild/linux-x86_64-3.12/cmake-build/experimental/example/fbgemm_gpu_experimental_example_py.so 2025-05-07T20:03:47.7661475Z 2025-05-07T20:03:47.7668880Z linux-vdso.so.1 (0x00007ffc947ea000) 2025-05-07T20:03:47.7669247Z libc10.so => not found 2025-05-07T20:03:47.7669535Z libnccl.so.2 => not found 2025-05-07T20:03:47.7669860Z libtorch_cpu.so => not found 2025-05-07T20:03:47.7670151Z libtorch_cuda.so => not found 2025-05-07T20:03:47.7670470Z libcudart.so.12 => not found 2025-05-07T20:03:47.7670756Z libtorch.so => not found 2025-05-07T20:03:47.7671122Z libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007efc7ba39000) 2025-05-07T20:03:47.7671557Z libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007efc7ba0b000) 2025-05-07T20:03:47.7671984Z libc.so.6 => /lib64/libc.so.6 (0x00007efc7b803000) 2025-05-07T20:03:47.7672383Z libm.so.6 => /lib64/libm.so.6 (0x00007efc7b728000) 2025-05-07T20:03:47.7672764Z /lib64/ld-linux-x86-64.so.2 (0x00007efc7bd18000) 2025-05-07T20:03:47.7673016Z 2025-05-07T20:03:47.7673158Z [CHECK] Displaying ELF information: 2025-05-07T20:03:47.7673762Z + readelf -d ./_skbuild/linux-x86_64-3.12/cmake-build/experimental/example/fbgemm_gpu_experimental_example_py.so 2025-05-07T20:03:47.7674279Z 2025-05-07T20:03:47.7711769Z 2025-05-07T20:03:47.7712125Z Dynamic section at offset 0x739a8 contains 36 entries: 2025-05-07T20:03:47.7712570Z Tag Type Name/Value 2025-05-07T20:03:47.7713040Z 0x0000000000000001 (NEEDED) Shared library: [libc10.so] 2025-05-07T20:03:47.7713757Z 0x0000000000000001 (NEEDED) Shared library: [libnccl.so.2] 2025-05-07T20:03:47.7714326Z 0x0000000000000001 (NEEDED) Shared library: [libtorch_cpu.so] 2025-05-07T20:03:47.7714863Z 0x0000000000000001 (NEEDED) Shared library: [libtorch_cuda.so] 2025-05-07T20:03:47.7715430Z 0x0000000000000001 (NEEDED) Shared library: [libcudart.so.12] 2025-05-07T20:03:47.7715962Z 0x0000000000000001 (NEEDED) Shared library: [libtorch.so] 2025-05-07T20:03:47.7716508Z 0x0000000000000001 (NEEDED) Shared library: [libstdc++.so.6] 2025-05-07T20:03:47.7717035Z 0x0000000000000001 (NEEDED) Shared library: [libgcc_s.so.1] 2025-05-07T20:03:47.7717587Z 0x0000000000000001 (NEEDED) Shared library: [libc.so.6] 2025-05-07T20:03:47.7718206Z 0x000000000000000e (SONAME) Library soname: [fbgemm_gpu_experimental_example_py.so] 2025-05-07T20:03:47.7718775Z 0x000000000000000c (INIT) 0x6000 2025-05-07T20:03:47.7719158Z 0x000000000000000d (FINI) 0xbadc 2025-05-07T20:03:47.7719509Z 0x0000000000000019 (INIT_ARRAY) 0x738b8 2025-05-07T20:03:47.7719900Z 0x000000000000001b (INIT_ARRAYSZ) 32 (bytes) 2025-05-07T20:03:47.7720264Z 0x000000000000001a (FINI_ARRAY) 0x738d8 2025-05-07T20:03:47.7720656Z 0x000000000000001c (FINI_ARRAYSZ) 8 (bytes) 2025-05-07T20:03:47.7721088Z 0x0000000000000004 (HASH) 0x200 2025-05-07T20:03:47.7721430Z 0x000000006ffffef5 (GNU_HASH) 0x900 2025-05-07T20:03:47.7721817Z 0x0000000000000005 (STRTAB) 0x1b70 2025-05-07T20:03:47.7722163Z 0x0000000000000006 (SYMTAB) 0xcd0 2025-05-07T20:03:47.7722569Z 0x000000000000000a (STRSZ) 10385 (bytes) 2025-05-07T20:03:47.7722953Z 0x000000000000000b (SYMENT) 24 (bytes) 2025-05-07T20:03:47.7723342Z 0x0000000000000003 (PLTGOT) 0x73c38 2025-05-07T20:03:47.7723713Z 0x0000000000000002 (PLTRELSZ) 1872 (bytes) 2025-05-07T20:03:47.7724103Z 0x0000000000000014 (PLTREL) RELA 2025-05-07T20:03:47.7724561Z 0x0000000000000017 (JMPREL) 0x4cb8 2025-05-07T20:03:47.7724902Z 0x0000000000000007 (RELA) 0x4610 2025-05-07T20:03:47.7725288Z 0x0000000000000008 (RELASZ) 1704 (bytes) 2025-05-07T20:03:47.7725660Z 0x0000000000000009 (RELAENT) 24 (bytes) 2025-05-07T20:03:47.7726035Z 0x0000000000000018 (BIND_NOW) 2025-05-07T20:03:47.7726377Z 0x000000006ffffffb (FLAGS_1) Flags: NOW 2025-05-07T20:03:47.7726768Z 0x000000006ffffffe (VERNEED) 0x4540 2025-05-07T20:03:47.7727117Z 0x000000006fffffff (VERNEEDNUM) 4 2025-05-07T20:03:47.7727478Z 0x000000006ffffff0 (VERSYM) 0x4402 2025-05-07T20:03:47.7727837Z 0x000000006ffffff9 (RELACOUNT) 7 2025-05-07T20:03:47.7728176Z 0x0000000000000000 (NULL) 0x0 2025-05-07T20:03:47.7728408Z 2025-05-07T20:03:47.7728575Z ################################################################################ 2025-05-07T20:03:47.7728817Z 2025-05-07T20:03:47.7728821Z 2025-05-07T20:03:47.7728958Z ################################################################################ 2025-05-07T20:03:47.7729443Z [CHECK] BUILT LIBRARY: ./_skbuild/linux-x86_64-3.12/cmake-build/asmjit.so 2025-05-07T20:03:47.7729917Z [CHECK] Listing out library size: 2025-05-07T20:03:47.7730338Z + du -h --block-size=1M ./_skbuild/linux-x86_64-3.12/cmake-build/asmjit.so 2025-05-07T20:03:47.7730666Z 2025-05-07T20:03:47.7731285Z 1 ./_skbuild/linux-x86_64-3.12/cmake-build/asmjit.so 2025-05-07T20:03:47.7731611Z 2025-05-07T20:03:47.7731990Z [CHECK] Listing out the GLIBC versions referenced by: ./_skbuild/linux-x86_64-3.12/cmake-build/asmjit.so 2025-05-07T20:03:47.7732883Z + objdump -TC ./_skbuild/linux-x86_64-3.12/cmake-build/asmjit.so | grep GLIBC_ | sed 's/.*GLIBC_\([.0-9]*\).*/GLIBC_\1/g' | sort -Vu | cat 2025-05-07T20:03:47.7733447Z 2025-05-07T20:03:47.7833197Z GLIBC_2.2.5 2025-05-07T20:03:47.7833834Z GLIBC_2.14 2025-05-07T20:03:47.7835688Z 2025-05-07T20:03:47.7835779Z 2025-05-07T20:03:47.7836682Z [CHECK] Listing out the GLIBCXX versions referenced by: ./_skbuild/linux-x86_64-3.12/cmake-build/asmjit.so 2025-05-07T20:03:47.7837664Z + objdump -TC ./_skbuild/linux-x86_64-3.12/cmake-build/asmjit.so | grep GLIBCXX_ | sed 's/.*GLIBCXX_\([.0-9]*\).*/GLIBCXX_\1/g' | sort -Vu | cat 2025-05-07T20:03:47.7838233Z 2025-05-07T20:03:47.7907512Z 2025-05-07T20:03:47.7907529Z 2025-05-07T20:03:47.7932804Z + nm -gDC ./_skbuild/linux-x86_64-3.12/cmake-build/asmjit.so > /tmp/tmp.bF6X3sMzYR.symbols.txt 2025-05-07T20:03:47.7934055Z 2025-05-07T20:03:47.7961297Z 2025-05-07T20:03:47.7987965Z [CHECK] Total Number of symbols: 803 2025-05-07T20:03:47.8005911Z [CHECK] Number of fbgemm symbols: 0 2025-05-07T20:03:47.8026238Z + nm -gDCu ./_skbuild/linux-x86_64-3.12/cmake-build/asmjit.so > /tmp/tmp.5GasVHStnc.usymbols.txt 2025-05-07T20:03:47.8027502Z 2025-05-07T20:03:47.8043342Z 2025-05-07T20:03:47.8067638Z [CHECK] Listing out undefined symbols (49 total): 2025-05-07T20:03:47.8085147Z U _Unwind_Resume@GCC_3.0 2025-05-07T20:03:47.8086250Z U __cxa_guard_acquire@CXXABI_1.3 2025-05-07T20:03:47.8087242Z U __cxa_guard_release@CXXABI_1.3 2025-05-07T20:03:47.8088209Z U __errno_location@GLIBC_2.2.5 2025-05-07T20:03:47.8089146Z U __gxx_personality_v0@CXXABI_1.3 2025-05-07T20:03:47.8090395Z U __popcountdi2@GCC_3.4 2025-05-07T20:03:47.8091250Z U abort@GLIBC_2.2.5 2025-05-07T20:03:47.8092087Z U close@GLIBC_2.2.5 2025-05-07T20:03:47.8092878Z U fputs@GLIBC_2.2.5 2025-05-07T20:03:47.8093337Z U free@GLIBC_2.2.5 2025-05-07T20:03:47.8093625Z U ftruncate64@GLIBC_2.2.5 2025-05-07T20:03:47.8093948Z U fwrite@GLIBC_2.2.5 2025-05-07T20:03:47.8094257Z U getenv@GLIBC_2.2.5 2025-05-07T20:03:47.8094547Z U getpagesize@GLIBC_2.2.5 2025-05-07T20:03:47.8094876Z U madvise@GLIBC_2.2.5 2025-05-07T20:03:47.8095235Z U malloc@GLIBC_2.2.5 2025-05-07T20:03:47.8095562Z U memcmp@GLIBC_2.2.5 2025-05-07T20:03:47.8095852Z U memcpy@GLIBC_2.14 2025-05-07T20:03:47.8096161Z U memmove@GLIBC_2.2.5 2025-05-07T20:03:47.8096456Z U memset@GLIBC_2.2.5 2025-05-07T20:03:47.8096770Z U mmap@GLIBC_2.2.5 2025-05-07T20:03:47.8097078Z U mprotect@GLIBC_2.2.5 2025-05-07T20:03:47.8097368Z U munmap@GLIBC_2.2.5 2025-05-07T20:03:47.8097671Z U open64@GLIBC_2.2.5 2025-05-07T20:03:47.8097999Z U operator delete(void*, unsigned long)@CXXABI_1.3.9 2025-05-07T20:03:47.8098396Z U pthread_mutex_destroy@GLIBC_2.2.5 2025-05-07T20:03:47.8098727Z U pthread_mutex_lock@GLIBC_2.2.5 2025-05-07T20:03:47.8099076Z U pthread_mutex_unlock@GLIBC_2.2.5 2025-05-07T20:03:47.8099383Z U read@GLIBC_2.2.5 2025-05-07T20:03:47.8099685Z U realloc@GLIBC_2.2.5 2025-05-07T20:03:47.8099992Z U shm_open 2025-05-07T20:03:47.8100243Z U shm_unlink 2025-05-07T20:03:47.8100532Z U snprintf@GLIBC_2.2.5 2025-05-07T20:03:47.8100822Z U stderr@GLIBC_2.2.5 2025-05-07T20:03:47.8101131Z U strcmp@GLIBC_2.2.5 2025-05-07T20:03:47.8101415Z U strlen@GLIBC_2.2.5 2025-05-07T20:03:47.8101721Z U strtol@GLIBC_2.2.5 2025-05-07T20:03:47.8102001Z U syscall@GLIBC_2.2.5 2025-05-07T20:03:47.8102313Z U sysconf@GLIBC_2.2.5 2025-05-07T20:03:47.8102589Z U uname@GLIBC_2.2.5 2025-05-07T20:03:47.8102886Z U unlink@GLIBC_2.2.5 2025-05-07T20:03:47.8103319Z U vsnprintf@GLIBC_2.2.5 2025-05-07T20:03:47.8103880Z U vtable for __cxxabiv1::__class_type_info@CXXABI_1.3 2025-05-07T20:03:47.8104480Z U vtable for __cxxabiv1::__si_class_type_info@CXXABI_1.3 2025-05-07T20:03:47.8104932Z U vtable for __cxxabiv1::__vmi_class_type_info@CXXABI_1.3 2025-05-07T20:03:47.8105365Z w _ITM_deregisterTMCloneTable 2025-05-07T20:03:47.8105708Z w _ITM_registerTMCloneTable 2025-05-07T20:03:47.8106062Z w __cxa_finalize@GLIBC_2.2.5 2025-05-07T20:03:47.8106406Z w __gmon_start__ 2025-05-07T20:03:47.8106751Z [CHECK] Listing out external shared libraries linked: 2025-05-07T20:03:47.8107205Z + ldd ./_skbuild/linux-x86_64-3.12/cmake-build/asmjit.so 2025-05-07T20:03:47.8107471Z 2025-05-07T20:03:47.8138193Z linux-vdso.so.1 (0x00007ffebddf8000) 2025-05-07T20:03:47.8139235Z libtorch_cpu.so => not found 2025-05-07T20:03:47.8140025Z libtorch_cuda.so => not found 2025-05-07T20:03:47.8140828Z libtorch.so => not found 2025-05-07T20:03:47.8142087Z libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007f16c4dcc000) 2025-05-07T20:03:47.8143601Z libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f16c4d9e000) 2025-05-07T20:03:47.8144041Z libc.so.6 => /lib64/libc.so.6 (0x00007f16c4b94000) 2025-05-07T20:03:47.8144442Z libm.so.6 => /lib64/libm.so.6 (0x00007f16c4ab9000) 2025-05-07T20:03:47.8144845Z /lib64/ld-linux-x86-64.so.2 (0x00007f16c50af000) 2025-05-07T20:03:47.8145094Z 2025-05-07T20:03:47.8145216Z [CHECK] Displaying ELF information: 2025-05-07T20:03:47.8145730Z + readelf -d ./_skbuild/linux-x86_64-3.12/cmake-build/asmjit.so 2025-05-07T20:03:47.8146022Z 2025-05-07T20:03:47.8176015Z 2025-05-07T20:03:47.8176790Z Dynamic section at offset 0x78e78 contains 33 entries: 2025-05-07T20:03:47.8177962Z Tag Type Name/Value 2025-05-07T20:03:47.8179281Z 0x0000000000000001 (NEEDED) Shared library: [libtorch_cpu.so] 2025-05-07T20:03:47.8180819Z 0x0000000000000001 (NEEDED) Shared library: [libtorch_cuda.so] 2025-05-07T20:03:47.8182325Z 0x0000000000000001 (NEEDED) Shared library: [libtorch.so] 2025-05-07T20:03:47.8184066Z 0x0000000000000001 (NEEDED) Shared library: [libstdc++.so.6] 2025-05-07T20:03:47.8184966Z 0x0000000000000001 (NEEDED) Shared library: [libgcc_s.so.1] 2025-05-07T20:03:47.8185489Z 0x0000000000000001 (NEEDED) Shared library: [libc.so.6] 2025-05-07T20:03:47.8185984Z 0x000000000000000e (SONAME) Library soname: [asmjit.so] 2025-05-07T20:03:47.8186425Z 0x000000000000000c (INIT) 0x1a000 2025-05-07T20:03:47.8186779Z 0x000000000000000d (FINI) 0x5af2c 2025-05-07T20:03:47.8187116Z 0x0000000000000019 (INIT_ARRAY) 0x780a0 2025-05-07T20:03:47.8187482Z 0x000000000000001b (INIT_ARRAYSZ) 8 (bytes) 2025-05-07T20:03:47.8187829Z 0x000000000000001a (FINI_ARRAY) 0x780a8 2025-05-07T20:03:47.8188186Z 0x000000000000001c (FINI_ARRAYSZ) 8 (bytes) 2025-05-07T20:03:47.8188529Z 0x0000000000000004 (HASH) 0x200 2025-05-07T20:03:47.8188884Z 0x000000006ffffef5 (GNU_HASH) 0x1e18 2025-05-07T20:03:47.8189219Z 0x0000000000000005 (STRTAB) 0x86e0 2025-05-07T20:03:47.8189567Z 0x0000000000000006 (SYMTAB) 0x3b80 2025-05-07T20:03:47.8189935Z 0x000000000000000a (STRSZ) 45342 (bytes) 2025-05-07T20:03:47.8190296Z 0x000000000000000b (SYMENT) 24 (bytes) 2025-05-07T20:03:47.8190661Z 0x0000000000000003 (PLTGOT) 0x790d8 2025-05-07T20:03:47.8191019Z 0x0000000000000002 (PLTRELSZ) 8064 (bytes) 2025-05-07T20:03:47.8191388Z 0x0000000000000014 (PLTREL) RELA 2025-05-07T20:03:47.8191718Z 0x0000000000000017 (JMPREL) 0x17220 2025-05-07T20:03:47.8192070Z 0x0000000000000007 (RELA) 0x13ed8 2025-05-07T20:03:47.8192415Z 0x0000000000000008 (RELASZ) 13128 (bytes) 2025-05-07T20:03:47.8192784Z 0x0000000000000009 (RELAENT) 24 (bytes) 2025-05-07T20:03:47.8193124Z 0x0000000000000018 (BIND_NOW) 2025-05-07T20:03:47.8193515Z 0x000000006ffffffb (FLAGS_1) Flags: NOW 2025-05-07T20:03:47.8193875Z 0x000000006ffffffe (VERNEED) 0x13e48 2025-05-07T20:03:47.8194203Z 0x000000006fffffff (VERNEEDNUM) 3 2025-05-07T20:03:47.8194539Z 0x000000006ffffff0 (VERSYM) 0x137fe 2025-05-07T20:03:47.8194862Z 0x000000006ffffff9 (RELACOUNT) 3 2025-05-07T20:03:47.8195185Z 0x0000000000000000 (NULL) 0x0 2025-05-07T20:03:47.8195390Z 2025-05-07T20:03:47.8195522Z ################################################################################ 2025-05-07T20:03:47.8195753Z 2025-05-07T20:03:47.8195758Z 2025-05-07T20:03:47.8195874Z ################################################################################ 2025-05-07T20:03:47.8196322Z [CHECK] BUILT LIBRARY: ./_skbuild/linux-x86_64-3.12/cmake-build/fbgemm.so 2025-05-07T20:03:47.8196744Z [CHECK] Listing out library size: 2025-05-07T20:03:47.8197193Z + du -h --block-size=1M ./_skbuild/linux-x86_64-3.12/cmake-build/fbgemm.so 2025-05-07T20:03:47.8197512Z 2025-05-07T20:03:47.8197687Z 6 ./_skbuild/linux-x86_64-3.12/cmake-build/fbgemm.so 2025-05-07T20:03:47.8197934Z 2025-05-07T20:03:47.8198259Z [CHECK] Listing out the GLIBC versions referenced by: ./_skbuild/linux-x86_64-3.12/cmake-build/fbgemm.so 2025-05-07T20:03:47.8199134Z + objdump -TC ./_skbuild/linux-x86_64-3.12/cmake-build/fbgemm.so | grep GLIBC_ | sed 's/.*GLIBC_\([.0-9]*\).*/GLIBC_\1/g' | sort -Vu | cat 2025-05-07T20:03:47.8201924Z 2025-05-07T20:03:47.8523044Z GLIBC_2.2.5 2025-05-07T20:03:47.8523497Z GLIBC_2.3 2025-05-07T20:03:47.8523726Z GLIBC_2.14 2025-05-07T20:03:47.8524018Z 2025-05-07T20:03:47.8524022Z 2025-05-07T20:03:47.8524380Z [CHECK] Listing out the GLIBCXX versions referenced by: ./_skbuild/linux-x86_64-3.12/cmake-build/fbgemm.so 2025-05-07T20:03:47.8525298Z + objdump -TC ./_skbuild/linux-x86_64-3.12/cmake-build/fbgemm.so | grep GLIBCXX_ | sed 's/.*GLIBCXX_\([.0-9]*\).*/GLIBCXX_\1/g' | sort -Vu | cat 2025-05-07T20:03:47.8525852Z 2025-05-07T20:03:47.8790840Z GLIBCXX_3.4 2025-05-07T20:03:47.8791110Z GLIBCXX_3.4.9 2025-05-07T20:03:47.8791351Z GLIBCXX_3.4.11 2025-05-07T20:03:47.8791565Z GLIBCXX_3.4.14 2025-05-07T20:03:47.8791919Z GLIBCXX_3.4.15 2025-05-07T20:03:47.8792132Z GLIBCXX_3.4.18 2025-05-07T20:03:47.8792355Z GLIBCXX_3.4.21 2025-05-07T20:03:47.8796545Z 2025-05-07T20:03:47.8796550Z 2025-05-07T20:03:47.8815721Z + nm -gDC ./_skbuild/linux-x86_64-3.12/cmake-build/fbgemm.so > /tmp/tmp.iGSbKRLhAY.symbols.txt 2025-05-07T20:03:47.8816939Z 2025-05-07T20:03:47.9033623Z 2025-05-07T20:03:47.9061173Z [CHECK] Total Number of symbols: 4871 2025-05-07T20:03:47.9093945Z [CHECK] Number of fbgemm symbols: 3365 2025-05-07T20:03:47.9114219Z + nm -gDCu ./_skbuild/linux-x86_64-3.12/cmake-build/fbgemm.so > /tmp/tmp.euVn8o0mbV.usymbols.txt 2025-05-07T20:03:47.9114670Z 2025-05-07T20:03:47.9149402Z 2025-05-07T20:03:47.9180496Z [CHECK] Listing out undefined symbols (135 total): 2025-05-07T20:03:47.9200376Z U _Unwind_Resume@GCC_3.0 2025-05-07T20:03:47.9200880Z U __cxa_allocate_exception@CXXABI_1.3 2025-05-07T20:03:47.9201271Z U __cxa_atexit@GLIBC_2.2.5 2025-05-07T20:03:47.9201672Z U __cxa_begin_catch@CXXABI_1.3 2025-05-07T20:03:47.9202025Z U __cxa_end_catch@CXXABI_1.3 2025-05-07T20:03:47.9202390Z U __cxa_free_exception@CXXABI_1.3 2025-05-07T20:03:47.9202720Z U __cxa_guard_abort@CXXABI_1.3 2025-05-07T20:03:47.9203063Z U __cxa_guard_acquire@CXXABI_1.3 2025-05-07T20:03:47.9203405Z U __cxa_guard_release@CXXABI_1.3 2025-05-07T20:03:47.9203754Z U __cxa_init_primary_exception@CXXABI_1.3.11 2025-05-07T20:03:47.9204120Z U __cxa_rethrow@CXXABI_1.3 2025-05-07T20:03:47.9204434Z U __cxa_thread_atexit@CXXABI_1.3.7 2025-05-07T20:03:47.9204771Z U __cxa_throw@CXXABI_1.3 2025-05-07T20:03:47.9205105Z U __cxa_throw_bad_array_new_length@CXXABI_1.3.8 2025-05-07T20:03:47.9205478Z U __gxx_personality_v0@CXXABI_1.3 2025-05-07T20:03:47.9206015Z U __once_proxy@GLIBCXX_3.4.11 2025-05-07T20:03:47.9206329Z U __tls_get_addr@GLIBC_2.3 2025-05-07T20:03:47.9206646Z U abort@GLIBC_2.2.5 2025-05-07T20:03:47.9207057Z U asmjit::_abi_1_13::BaseAssembler::bind(asmjit::_abi_1_13::Label const&) 2025-05-07T20:03:47.9207533Z U asmjit::_abi_1_13::BaseAssembler::newLabel() 2025-05-07T20:03:47.9208049Z U asmjit::_abi_1_13::BaseEmitter::_emitI(unsigned int, asmjit::_abi_1_13::Operand_ const&) 2025-05-07T20:03:47.9209032Z U asmjit::_abi_1_13::BaseEmitter::_emitI(unsigned int, asmjit::_abi_1_13::Operand_ const&, asmjit::_abi_1_13::Operand_ const&) 2025-05-07T20:03:47.9210088Z U asmjit::_abi_1_13::BaseEmitter::_emitI(unsigned int, asmjit::_abi_1_13::Operand_ const&, asmjit::_abi_1_13::Operand_ const&, asmjit::_abi_1_13::Operand_ const&) 2025-05-07T20:03:47.9211329Z U asmjit::_abi_1_13::BaseEmitter::_emitI(unsigned int, asmjit::_abi_1_13::Operand_ const&, asmjit::_abi_1_13::Operand_ const&, asmjit::_abi_1_13::Operand_ const&, asmjit::_abi_1_13::Operand_ const&) 2025-05-07T20:03:47.9212478Z U asmjit::_abi_1_13::BaseEmitter::emitArgsAssignment(asmjit::_abi_1_13::FuncFrame const&, asmjit::_abi_1_13::FuncArgsAssignment const&) 2025-05-07T20:03:47.9213323Z U asmjit::_abi_1_13::BaseEmitter::emitEpilog(asmjit::_abi_1_13::FuncFrame const&) 2025-05-07T20:03:47.9213912Z U asmjit::_abi_1_13::BaseEmitter::emitProlog(asmjit::_abi_1_13::FuncFrame const&) 2025-05-07T20:03:47.9214551Z U asmjit::_abi_1_13::CodeHolder::CodeHolder(asmjit::_abi_1_13::Support::Temporary const*) 2025-05-07T20:03:47.9215214Z U asmjit::_abi_1_13::CodeHolder::init(asmjit::_abi_1_13::Environment const&, unsigned long) 2025-05-07T20:03:47.9215730Z U asmjit::_abi_1_13::CodeHolder::~CodeHolder() 2025-05-07T20:03:47.9216288Z U asmjit::_abi_1_13::FuncArgsAssignment::updateFuncFrame(asmjit::_abi_1_13::FuncFrame&) const 2025-05-07T20:03:47.9217080Z U asmjit::_abi_1_13::FuncDetail::init(asmjit::_abi_1_13::FuncSignature const&, asmjit::_abi_1_13::Environment const&) 2025-05-07T20:03:47.9217709Z U asmjit::_abi_1_13::FuncFrame::finalize() 2025-05-07T20:03:47.9218191Z U asmjit::_abi_1_13::FuncFrame::init(asmjit::_abi_1_13::FuncDetail const&) 2025-05-07T20:03:47.9218826Z U asmjit::_abi_1_13::JitRuntime::JitRuntime(asmjit::_abi_1_13::JitAllocator::CreateParams const*) 2025-05-07T20:03:47.9219495Z U asmjit::_abi_1_13::JitRuntime::_add(void**, asmjit::_abi_1_13::CodeHolder*) 2025-05-07T20:03:47.9220011Z U asmjit::_abi_1_13::JitRuntime::~JitRuntime() 2025-05-07T20:03:47.9220490Z U asmjit::_abi_1_13::x86::Assembler::Assembler(asmjit::_abi_1_13::CodeHolder*) 2025-05-07T20:03:47.9220999Z U asmjit::_abi_1_13::x86::Assembler::~Assembler() 2025-05-07T20:03:47.9221372Z U cpuinfo_get_packages 2025-05-07T20:03:47.9221730Z U cpuinfo_get_packages_count 2025-05-07T20:03:47.9222059Z U cpuinfo_initialize 2025-05-07T20:03:47.9222394Z U cpuinfo_isa 2025-05-07T20:03:47.9222704Z U fma@GLIBC_2.2.5 2025-05-07T20:03:47.9223000Z U fmaf@GLIBC_2.2.5 2025-05-07T20:03:47.9223449Z U fminf@GLIBC_2.2.5 2025-05-07T20:03:47.9223749Z U free@GLIBC_2.2.5 2025-05-07T20:03:47.9224081Z U fwrite@GLIBC_2.2.5 2025-05-07T20:03:47.9224412Z U getenv@GLIBC_2.2.5 2025-05-07T20:03:47.9224733Z U log2@GLIBC_2.2.5 2025-05-07T20:03:47.9225026Z U log2f@GLIBC_2.2.5 2025-05-07T20:03:47.9225355Z U lrintf@GLIBC_2.2.5 2025-05-07T20:03:47.9225671Z U memcmp@GLIBC_2.2.5 2025-05-07T20:03:47.9226042Z U memcpy@GLIBC_2.14 2025-05-07T20:03:47.9226375Z U memmove@GLIBC_2.2.5 2025-05-07T20:03:47.9226682Z U memset@GLIBC_2.2.5 2025-05-07T20:03:47.9227015Z U nearbyint@GLIBC_2.2.5 2025-05-07T20:03:47.9227341Z U nearbyintf@GLIBC_2.2.5 2025-05-07T20:03:47.9243702Z U operator delete(void*, unsigned long)@CXXABI_1.3.9 2025-05-07T20:03:47.9244277Z U operator delete[](void*)@GLIBCXX_3.4 2025-05-07T20:03:47.9244691Z U operator new(unsigned long)@GLIBCXX_3.4 2025-05-07T20:03:47.9245107Z U operator new[](unsigned long)@GLIBCXX_3.4 2025-05-07T20:03:47.9245508Z U posix_memalign@GLIBC_2.2.5 2025-05-07T20:03:47.9245867Z U pow@GLIBC_2.2.5 2025-05-07T20:03:47.9246171Z U sqrtf@GLIBC_2.2.5 2025-05-07T20:03:47.9246744Z U std::_Hash_bytes(void const*, unsigned long, unsigned long)@CXXABI_1.3.5 2025-05-07T20:03:47.9247272Z U std::_Rb_tree_decrement(std::_Rb_tree_node_base*)@GLIBCXX_3.4 2025-05-07T20:03:47.9247781Z U std::_Rb_tree_increment(std::_Rb_tree_node_base*)@GLIBCXX_3.4 2025-05-07T20:03:47.9248483Z U std::_Rb_tree_insert_and_rebalance(bool, std::_Rb_tree_node_base*, std::_Rb_tree_node_base*, std::_Rb_tree_node_base&)@GLIBCXX_3.4 2025-05-07T20:03:47.9249235Z U std::__atomic_futex_unsigned_base::_M_futex_notify_all(unsigned int*)@GLIBCXX_3.4.21 2025-05-07T20:03:47.9250331Z U std::__atomic_futex_unsigned_base::_M_futex_wait_until(unsigned int*, unsigned int, bool, std::chrono::duration >, std::chrono::duration >)@GLIBCXX_3.4.21 2025-05-07T20:03:47.9251513Z U std::__detail::_Prime_rehash_policy::_M_need_rehash(unsigned long, unsigned long, unsigned long) const@GLIBCXX_3.4.18 2025-05-07T20:03:47.9252249Z U std::__detail::_Prime_rehash_policy::_M_next_bkt(unsigned long) const@GLIBCXX_3.4.18 2025-05-07T20:03:47.9252799Z U std::__exception_ptr::exception_ptr::_M_addref() 2025-05-07T20:03:47.9253492Z U std::__exception_ptr::exception_ptr::_M_release() 2025-05-07T20:03:47.9254014Z U std::__exception_ptr::exception_ptr::exception_ptr(void*)@CXXABI_1.3.11 2025-05-07T20:03:47.9254579Z U std::__future_base::_Result_base::_Result_base()@GLIBCXX_3.4.15 2025-05-07T20:03:47.9255077Z U std::__future_base::_Result_base::~_Result_base()@GLIBCXX_3.4.15 2025-05-07T20:03:47.9255532Z U std::__once_call@GLIBCXX_3.4.11 2025-05-07T20:03:47.9255894Z U std::__once_callable@GLIBCXX_3.4.11 2025-05-07T20:03:47.9256289Z U std::__throw_bad_alloc()@GLIBCXX_3.4 2025-05-07T20:03:47.9256684Z U std::__throw_bad_array_new_length() 2025-05-07T20:03:47.9257043Z U std::__throw_bad_cast()@GLIBCXX_3.4 2025-05-07T20:03:47.9257447Z U std::__throw_bad_function_call()@GLIBCXX_3.4.14 2025-05-07T20:03:47.9257856Z U std::__throw_future_error(int)@GLIBCXX_3.4.14 2025-05-07T20:03:47.9258282Z U std::__throw_length_error(char const*)@GLIBCXX_3.4 2025-05-07T20:03:47.9258695Z U std::__throw_logic_error(char const*)@GLIBCXX_3.4 2025-05-07T20:03:47.9259123Z U std::__throw_system_error(int)@GLIBCXX_3.4.11 2025-05-07T20:03:47.9259532Z U std::bad_alloc::~bad_alloc()@GLIBCXX_3.4 2025-05-07T20:03:47.9260368Z U std::basic_ostream >& std::__ostream_insert >(std::basic_ostream >&, char const*, long)@GLIBCXX_3.4.9 2025-05-07T20:03:47.9261198Z U std::cerr@GLIBCXX_3.4 2025-05-07T20:03:47.9261520Z U std::cout@GLIBCXX_3.4 2025-05-07T20:03:47.9261923Z U std::ctype::_M_widen_init() const@GLIBCXX_3.4.11 2025-05-07T20:03:47.9262399Z U std::future_category()@GLIBCXX_3.4.15 2025-05-07T20:03:47.9262799Z U std::future_error::~future_error()@GLIBCXX_3.4.14 2025-05-07T20:03:47.9263353Z U std::ios_base::Init::Init()@GLIBCXX_3.4 2025-05-07T20:03:47.9263734Z U std::ios_base::Init::~Init()@GLIBCXX_3.4 2025-05-07T20:03:47.9264422Z U std::logic_error::logic_error(std::__cxx11::basic_string, std::allocator > const&)@GLIBCXX_3.4.21 2025-05-07T20:03:47.9265194Z U std::logic_error::logic_error(std::logic_error const&)@GLIBCXX_3.4.21 2025-05-07T20:03:47.9265738Z U std::ostream& std::ostream::_M_insert(double)@GLIBCXX_3.4.9 2025-05-07T20:03:47.9266289Z U std::ostream& std::ostream::_M_insert(long)@GLIBCXX_3.4.9 2025-05-07T20:03:47.9266876Z U std::ostream& std::ostream::_M_insert(unsigned long)@GLIBCXX_3.4.9 2025-05-07T20:03:47.9267400Z U std::ostream::flush()@GLIBCXX_3.4 2025-05-07T20:03:47.9267807Z U std::ostream::operator<<(int)@GLIBCXX_3.4 2025-05-07T20:03:47.9268189Z U std::ostream::put(char)@GLIBCXX_3.4 2025-05-07T20:03:47.9268678Z U std::rethrow_exception(std::__exception_ptr::exception_ptr)@CXXABI_1.3.3 2025-05-07T20:03:47.9269222Z U std::runtime_error::runtime_error(char const*)@GLIBCXX_3.4.21 2025-05-07T20:03:47.9269731Z U std::runtime_error::~runtime_error()@GLIBCXX_3.4 2025-05-07T20:03:47.9270137Z U std::terminate()@GLIBCXX_3.4 2025-05-07T20:03:47.9270472Z U stderr@GLIBC_2.2.5 2025-05-07T20:03:47.9270812Z U strcmp@GLIBC_2.2.5 2025-05-07T20:03:47.9271119Z U strlen@GLIBC_2.2.5 2025-05-07T20:03:47.9271451Z U strstr@GLIBC_2.2.5 2025-05-07T20:03:47.9271764Z U tolower@GLIBC_2.2.5 2025-05-07T20:03:47.9272109Z U toupper@GLIBC_2.2.5 2025-05-07T20:03:47.9272512Z U typeinfo for std::__future_base::_Result_base@GLIBCXX_3.4.15 2025-05-07T20:03:47.9273021Z U typeinfo for std::bad_alloc@GLIBCXX_3.4 2025-05-07T20:03:47.9273454Z U typeinfo for std::future_error@GLIBCXX_3.4.14 2025-05-07T20:03:47.9273860Z U typeinfo for std::runtime_error@GLIBCXX_3.4 2025-05-07T20:03:47.9274309Z U vtable for __cxxabiv1::__class_type_info@CXXABI_1.3 2025-05-07T20:03:47.9274761Z U vtable for __cxxabiv1::__si_class_type_info@CXXABI_1.3 2025-05-07T20:03:47.9275201Z U vtable for std::bad_alloc@GLIBCXX_3.4 2025-05-07T20:03:47.9275612Z U vtable for std::future_error@GLIBCXX_3.4.14 2025-05-07T20:03:47.9275988Z w _ITM_deregisterTMCloneTable 2025-05-07T20:03:47.9276368Z w _ITM_registerTMCloneTable 2025-05-07T20:03:47.9276711Z w __cxa_finalize@GLIBC_2.2.5 2025-05-07T20:03:47.9277069Z w __gmon_start__ 2025-05-07T20:03:47.9277369Z w __pthread_key_create 2025-05-07T20:03:47.9277725Z w pthread_mutex_lock@GLIBC_2.2.5 2025-05-07T20:03:47.9278081Z w pthread_mutex_unlock@GLIBC_2.2.5 2025-05-07T20:03:47.9278448Z w pthread_once 2025-05-07T20:03:47.9278765Z w pthread_rwlock_rdlock 2025-05-07T20:03:47.9279086Z w pthread_rwlock_unlock 2025-05-07T20:03:47.9279427Z w pthread_rwlock_wrlock 2025-05-07T20:03:47.9279749Z w pthread_self@GLIBC_2.2.5 2025-05-07T20:03:47.9280144Z [CHECK] Listing out external shared libraries linked: 2025-05-07T20:03:47.9280565Z + ldd ./_skbuild/linux-x86_64-3.12/cmake-build/fbgemm.so 2025-05-07T20:03:47.9280853Z 2025-05-07T20:03:47.9280997Z linux-vdso.so.1 (0x00007ffcfe1e0000) 2025-05-07T20:03:47.9281336Z libc10.so => not found 2025-05-07T20:03:47.9281878Z asmjit.so => /__w/FBGEMM/FBGEMM/fbgemm_gpu/./_skbuild/linux-x86_64-3.12/cmake-build/asmjit.so (0x00007fe69e785000) 2025-05-07T20:03:47.9282525Z libtorch.so => not found 2025-05-07T20:03:47.9282798Z libtorch_cpu.so => not found 2025-05-07T20:03:47.9283120Z libtorch_cuda.so => not found 2025-05-07T20:03:47.9283476Z libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007fe69e521000) 2025-05-07T20:03:47.9283918Z libm.so.6 => /lib64/libm.so.6 (0x00007fe69e446000) 2025-05-07T20:03:47.9284335Z libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007fe69ed51000) 2025-05-07T20:03:47.9284769Z libc.so.6 => /lib64/libc.so.6 (0x00007fe69e23e000) 2025-05-07T20:03:47.9285177Z /lib64/ld-linux-x86-64.so.2 (0x00007fe69ed85000) 2025-05-07T20:03:47.9285533Z libtorch_cpu.so => not found 2025-05-07T20:03:47.9285849Z libtorch_cuda.so => not found 2025-05-07T20:03:47.9286135Z libtorch.so => not found 2025-05-07T20:03:47.9286321Z 2025-05-07T20:03:47.9286434Z [CHECK] Displaying ELF information: 2025-05-07T20:03:47.9286865Z + readelf -d ./_skbuild/linux-x86_64-3.12/cmake-build/fbgemm.so 2025-05-07T20:03:47.9287176Z 2025-05-07T20:03:47.9323630Z 2025-05-07T20:03:47.9324372Z Dynamic section at offset 0x51fb38 contains 38 entries: 2025-05-07T20:03:47.9325587Z Tag Type Name/Value 2025-05-07T20:03:47.9326844Z 0x0000000000000001 (NEEDED) Shared library: [libc10.so] 2025-05-07T20:03:47.9328313Z 0x0000000000000001 (NEEDED) Shared library: [asmjit.so] 2025-05-07T20:03:47.9330071Z 0x0000000000000001 (NEEDED) Shared library: [libtorch.so] 2025-05-07T20:03:47.9331540Z 0x0000000000000001 (NEEDED) Shared library: [libtorch_cpu.so] 2025-05-07T20:03:47.9332949Z 0x0000000000000001 (NEEDED) Shared library: [libtorch_cuda.so] 2025-05-07T20:03:47.9333469Z 0x0000000000000001 (NEEDED) Shared library: [libstdc++.so.6] 2025-05-07T20:03:47.9333993Z 0x0000000000000001 (NEEDED) Shared library: [libm.so.6] 2025-05-07T20:03:47.9334505Z 0x0000000000000001 (NEEDED) Shared library: [libgcc_s.so.1] 2025-05-07T20:03:47.9335034Z 0x0000000000000001 (NEEDED) Shared library: [libc.so.6] 2025-05-07T20:03:47.9335661Z 0x0000000000000001 (NEEDED) Shared library: [ld-linux-x86-64.so.2] 2025-05-07T20:03:47.9336210Z 0x000000000000000e (SONAME) Library soname: [fbgemm.so] 2025-05-07T20:03:47.9336746Z 0x000000000000000f (RPATH) Library rpath: [$ORIGIN] 2025-05-07T20:03:47.9337182Z 0x000000000000000c (INIT) 0xf6000 2025-05-07T20:03:47.9337577Z 0x000000000000000d (FINI) 0x4c8fb0 2025-05-07T20:03:47.9337950Z 0x0000000000000019 (INIT_ARRAY) 0x51dac0 2025-05-07T20:03:47.9338413Z 0x000000000000001b (INIT_ARRAYSZ) 56 (bytes) 2025-05-07T20:03:47.9338813Z 0x000000000000001a (FINI_ARRAY) 0x51daf8 2025-05-07T20:03:47.9339193Z 0x000000000000001c (FINI_ARRAYSZ) 8 (bytes) 2025-05-07T20:03:47.9339535Z 0x0000000000000004 (HASH) 0x238 2025-05-07T20:03:47.9339890Z 0x000000006ffffef5 (GNU_HASH) 0x6e20 2025-05-07T20:03:47.9340232Z 0x0000000000000005 (STRTAB) 0x2b0a0 2025-05-07T20:03:47.9340588Z 0x0000000000000006 (SYMTAB) 0xe7e0 2025-05-07T20:03:47.9340947Z 0x000000000000000a (STRSZ) 708057 (bytes) 2025-05-07T20:03:47.9341377Z 0x000000000000000b (SYMENT) 24 (bytes) 2025-05-07T20:03:47.9341731Z 0x0000000000000003 (PLTGOT) 0x520dd8 2025-05-07T20:03:47.9342122Z 0x0000000000000002 (PLTRELSZ) 24312 (bytes) 2025-05-07T20:03:47.9342498Z 0x0000000000000014 (PLTREL) RELA 2025-05-07T20:03:47.9342828Z 0x0000000000000017 (JMPREL) 0xef8e0 2025-05-07T20:03:47.9343312Z 0x0000000000000007 (RELA) 0xda610 2025-05-07T20:03:47.9343682Z 0x0000000000000008 (RELASZ) 86736 (bytes) 2025-05-07T20:03:47.9344080Z 0x0000000000000009 (RELAENT) 24 (bytes) 2025-05-07T20:03:47.9344468Z 0x0000000000000018 (BIND_NOW) 2025-05-07T20:03:47.9344839Z 0x000000006ffffffb (FLAGS_1) Flags: NOW 2025-05-07T20:03:47.9345299Z 0x000000006ffffffe (VERNEED) 0xda490 2025-05-07T20:03:47.9345669Z 0x000000006fffffff (VERNEEDNUM) 5 2025-05-07T20:03:47.9346032Z 0x000000006ffffff0 (VERSYM) 0xd7e7a 2025-05-07T20:03:47.9346370Z 0x000000006ffffff9 (RELACOUNT) 9 2025-05-07T20:03:47.9346720Z 0x0000000000000000 (NULL) 0x0 2025-05-07T20:03:47.9346938Z 2025-05-07T20:03:47.9347063Z ################################################################################ 2025-05-07T20:03:47.9347326Z 2025-05-07T20:03:47.9347330Z 2025-05-07T20:03:47.9347561Z [CHECK] Verifying sample subset of symbols in the built libraries ... 2025-05-07T20:03:47.9520466Z [CHECK] Found symbol in ./_skbuild/linux-x86_64-3.12/cmake-build/experimental/gen_ai/fbgemm_gpu_experimental_gen_ai.so: fbgemm_gpu::per_tensor_quantize_i8 2025-05-07T20:03:47.9521861Z ################################################################################ 2025-05-07T20:03:47.9522551Z [BUILD] Wheel Audit: dist/fbgemm_gpu_genai_nightly-2025.5.7-cp312-cp312-manylinux_2_28_x86_64.whl 2025-05-07T20:03:47.9522984Z 2025-05-07T20:03:47.9530315Z + conda run --no-capture-output -n build_binary auditwheel show dist/fbgemm_gpu_genai_nightly-2025.5.7-cp312-cp312-manylinux_2_28_x86_64.whl 2025-05-07T20:03:47.9532052Z 2025-05-07T20:03:51.4605583Z 2025-05-07T20:03:51.4606347Z fbgemm_gpu_genai_nightly-2025.5.7-cp312-cp312-manylinux_2_28_x86_64.whl 2025-05-07T20:03:51.4607338Z is consistent with the following platform tag: "linux_x86_64". 2025-05-07T20:03:51.4607676Z 2025-05-07T20:03:51.4607853Z The wheel references external versioned symbols in these 2025-05-07T20:03:51.4608360Z system-provided shared libraries: libgcc_s.so.1 with versions 2025-05-07T20:03:51.4609014Z {'GCC_3.0', 'GCC_3.4'}, libstdc++.so.6 with versions {'CXXABI_1.3.5', 2025-05-07T20:03:51.4609532Z 'CXXABI_1.3.11', 'GLIBCXX_3.4.18', 'GLIBCXX_3.4.9', 'GLIBCXX_3.4.14', 2025-05-07T20:03:51.4610021Z 'GLIBCXX_3.4', 'GLIBCXX_3.4.15', 'GLIBCXX_3.4.29', 'CXXABI_1.3.8', 2025-05-07T20:03:51.4610537Z 'CXXABI_1.3.7', 'GLIBCXX_3.4.11', 'GLIBCXX_3.4.21', 'GLIBCXX_3.4.20', 2025-05-07T20:03:51.4611120Z 'CXXABI_1.3', 'CXXABI_1.3.3', 'CXXABI_1.3.9'}, libc.so.6 with versions 2025-05-07T20:03:51.4611590Z {'GLIBC_2.14', 'GLIBC_2.2.5'}, libm.so.6 with versions 2025-05-07T20:03:51.4612053Z {'GLIBC_2.2.5'}, libcudart.so.12 with versions {'libcudart.so.12'} 2025-05-07T20:03:51.4612362Z 2025-05-07T20:03:51.4612579Z This constrains the platform tag to "manylinux_2_34_x86_64". In order 2025-05-07T20:03:51.4613132Z to achieve a more compatible tag, you would need to recompile a new 2025-05-07T20:03:51.4613625Z wheel from source on a system with earlier versions of these 2025-05-07T20:03:51.4614074Z libraries, such as a recent manylinux image. 2025-05-07T20:03:51.5540333Z 2025-05-07T20:03:51.5540353Z 2025-05-07T20:03:51.5541165Z ################################################################################ 2025-05-07T20:03:51.5542121Z [BUILD] Enumerating the built wheels ... 2025-05-07T20:03:51.5542673Z + ls -lth dist/fbgemm_gpu_genai_nightly-2025.5.7-cp312-cp312-manylinux_2_28_x86_64.whl 2025-05-07T20:03:51.5543068Z 2025-05-07T20:03:51.5609176Z -rw-r--r--. 1 root root 18M May 7 20:03 dist/fbgemm_gpu_genai_nightly-2025.5.7-cp312-cp312-manylinux_2_28_x86_64.whl 2025-05-07T20:03:51.5609706Z 2025-05-07T20:03:51.5609890Z [BUILD] Enumerating the wheel SHAs ... 2025-05-07T20:03:51.5610439Z + sha1sum dist/fbgemm_gpu_genai_nightly-2025.5.7-cp312-cp312-manylinux_2_28_x86_64.whl 2025-05-07T20:03:51.5610834Z 2025-05-07T20:03:51.5977123Z c73a702bbc09a0f1f522be4fc10889dc19360f75 dist/fbgemm_gpu_genai_nightly-2025.5.7-cp312-cp312-manylinux_2_28_x86_64.whl 2025-05-07T20:03:51.5978916Z 2025-05-07T20:03:51.5987954Z + sha256sum dist/fbgemm_gpu_genai_nightly-2025.5.7-cp312-cp312-manylinux_2_28_x86_64.whl 2025-05-07T20:03:51.5989152Z 2025-05-07T20:03:51.6790953Z 3a160ecc54665559cce7e57cc15438640cf521df66903a79480f30a5b3cf6942 dist/fbgemm_gpu_genai_nightly-2025.5.7-cp312-cp312-manylinux_2_28_x86_64.whl 2025-05-07T20:03:51.6791892Z 2025-05-07T20:03:51.6797744Z + md5sum dist/fbgemm_gpu_genai_nightly-2025.5.7-cp312-cp312-manylinux_2_28_x86_64.whl 2025-05-07T20:03:51.6798868Z 2025-05-07T20:03:51.7113435Z e7438d9eb3f38b23c683d9c8a7a66fd4 dist/fbgemm_gpu_genai_nightly-2025.5.7-cp312-cp312-manylinux_2_28_x86_64.whl 2025-05-07T20:03:51.7114008Z 2025-05-07T20:03:51.7114153Z [BUILD] FBGEMM-GPU build + package completed 2025-05-07T20:03:51.8569738Z ##[group]Run actions/upload-artifact@v4 2025-05-07T20:03:51.8570103Z with: 2025-05-07T20:03:51.8570362Z name: fbgemm_genai_x86_gcc_py3.12_cu12.8.0.whl 2025-05-07T20:03:51.8570686Z path: fbgemm_gpu/dist/*.whl 2025-05-07T20:03:51.8570974Z if-no-files-found: error 2025-05-07T20:03:51.8571228Z compression-level: 6 2025-05-07T20:03:51.8571474Z overwrite: false 2025-05-07T20:03:51.8571703Z include-hidden-files: false 2025-05-07T20:03:51.8571968Z env: 2025-05-07T20:03:51.8572193Z PRELUDE: .github/scripts/setup_env.bash 2025-05-07T20:03:51.8572512Z BUILD_ENV: build_binary 2025-05-07T20:03:51.8572762Z BUILD_TARGET: genai 2025-05-07T20:03:51.8572987Z BUILD_VARIANT: cuda 2025-05-07T20:03:51.8573233Z BUILD_CUDA_VERSION: 12.8.0 2025-05-07T20:03:51.8573475Z ##[endgroup] 2025-05-07T20:03:51.8592485Z ##[command]/usr/bin/docker exec 3634184bf123dceeb3f825b9e1bbe7817ab6e81e701296975297ae3260d158ce sh -c "cat /etc/*release | grep ^ID" 2025-05-07T20:03:52.7982983Z With the provided path, there will be 1 file uploaded 2025-05-07T20:03:52.7985221Z Artifact name is valid! 2025-05-07T20:03:52.7986832Z Root directory input is valid! 2025-05-07T20:03:52.9054070Z Beginning upload of artifact content to blob storage 2025-05-07T20:03:53.7688151Z Uploaded bytes 8388608 2025-05-07T20:03:53.9833746Z Uploaded bytes 16777216 2025-05-07T20:03:54.0700404Z Uploaded bytes 18498190 2025-05-07T20:03:54.0860706Z Finished uploading artifact content to blob storage! 2025-05-07T20:03:54.0862402Z SHA256 digest of uploaded artifact zip is 44a8371d786eb18d4cfaf0c12983918cf9c0bfea6fa4b0e46e2bab9751f50039 2025-05-07T20:03:54.0863063Z Finalizing artifact upload 2025-05-07T20:03:54.1524323Z Artifact fbgemm_genai_x86_gcc_py3.12_cu12.8.0.whl.zip successfully finalized. Artifact ID 3081407199 2025-05-07T20:03:54.1527054Z Artifact fbgemm_genai_x86_gcc_py3.12_cu12.8.0.whl has been successfully uploaded! Final size is 18498190 bytes. Artifact ID is 3081407199 2025-05-07T20:03:54.1529070Z Artifact download URL: https://github.com/pytorch/FBGEMM/actions/runs/14891846252/artifacts/3081407199 2025-05-07T20:03:54.1775906Z Post job cleanup. 2025-05-07T20:03:54.1787740Z ##[command]/usr/bin/docker exec 3634184bf123dceeb3f825b9e1bbe7817ab6e81e701296975297ae3260d158ce sh -c "cat /etc/*release | grep ^ID" 2025-05-07T20:03:54.5024167Z [command]/usr/bin/git version 2025-05-07T20:03:54.5279983Z git version 2.47.1 2025-05-07T20:03:54.5322123Z Copying '/github/home/.gitconfig' to '/__w/_temp/668f2c9f-5d3c-4bab-9cc3-0e57fb530113/.gitconfig' 2025-05-07T20:03:54.5338818Z Temporarily overriding HOME='/__w/_temp/668f2c9f-5d3c-4bab-9cc3-0e57fb530113' before making global git config changes 2025-05-07T20:03:54.5341280Z Adding repository directory to the temporary git global config as a safe directory 2025-05-07T20:03:54.5344283Z [command]/usr/bin/git config --global --add safe.directory /__w/FBGEMM/FBGEMM 2025-05-07T20:03:54.5401323Z [command]/usr/bin/git config --local --name-only --get-regexp core\.sshCommand 2025-05-07T20:03:54.5429092Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'core\.sshCommand' && git config --local --unset-all 'core.sshCommand' || :" 2025-05-07T20:03:54.5993073Z Entering 'external/asmjit' 2025-05-07T20:03:54.6119032Z Entering 'external/composable_kernel' 2025-05-07T20:03:54.6278127Z Entering 'external/cpuinfo' 2025-05-07T20:03:54.6380630Z Entering 'external/cutlass' 2025-05-07T20:03:54.6566054Z Entering 'external/googletest' 2025-05-07T20:03:54.6673142Z Entering 'external/hipify_torch' 2025-05-07T20:03:54.6786190Z Entering 'external/json' 2025-05-07T20:03:54.6870156Z [command]/usr/bin/git config --local --name-only --get-regexp http\.https\:\/\/github\.com\/\.extraheader 2025-05-07T20:03:54.6890078Z http.https://github.com/.extraheader 2025-05-07T20:03:54.6896337Z [command]/usr/bin/git config --local --unset-all http.https://github.com/.extraheader 2025-05-07T20:03:54.6922722Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'http\.https\:\/\/github\.com\/\.extraheader' && git config --local --unset-all 'http.https://github.com/.extraheader' || :" 2025-05-07T20:03:54.7201474Z Entering 'external/asmjit' 2025-05-07T20:03:54.7231466Z http.https://github.com/.extraheader 2025-05-07T20:03:54.7276887Z Entering 'external/composable_kernel' 2025-05-07T20:03:54.7325213Z http.https://github.com/.extraheader 2025-05-07T20:03:54.7374680Z Entering 'external/cpuinfo' 2025-05-07T20:03:54.7408151Z http.https://github.com/.extraheader 2025-05-07T20:03:54.7457005Z Entering 'external/cutlass' 2025-05-07T20:03:54.7490716Z http.https://github.com/.extraheader 2025-05-07T20:03:54.7549528Z Entering 'external/googletest' 2025-05-07T20:03:54.7584819Z http.https://github.com/.extraheader 2025-05-07T20:03:54.7626059Z Entering 'external/hipify_torch' 2025-05-07T20:03:54.7678429Z http.https://github.com/.extraheader 2025-05-07T20:03:54.7715509Z Entering 'external/json' 2025-05-07T20:03:54.7765027Z http.https://github.com/.extraheader 2025-05-07T20:03:54.7999052Z Stop and remove container: 78fd0503c2694dd883e9ecfdef44701d_amazonlinux2023_f768cc 2025-05-07T20:03:54.8004280Z ##[command]/usr/bin/docker rm --force 3634184bf123dceeb3f825b9e1bbe7817ab6e81e701296975297ae3260d158ce 2025-05-07T20:03:56.0970927Z 3634184bf123dceeb3f825b9e1bbe7817ab6e81e701296975297ae3260d158ce 2025-05-07T20:03:56.1012344Z Remove container network: github_network_3a00299775bc40d0bdf0cd164938c757 2025-05-07T20:03:56.1017030Z ##[command]/usr/bin/docker network rm github_network_3a00299775bc40d0bdf0cd164938c757 2025-05-07T20:03:57.0381110Z github_network_3a00299775bc40d0bdf0cd164938c757 2025-05-07T20:03:57.0420081Z A job completed hook has been configured by the self-hosted runner administrator 2025-05-07T20:03:57.0628440Z ##[group]Run '/home/ec2-user/runner-scripts/after_job.sh' 2025-05-07T20:03:57.0634217Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-05-07T20:03:57.0634638Z ##[endgroup] 2025-05-07T20:04:09.1429837Z Cleaning up orphan processes